WarpFlow: Exploring Petabytes of Space-Time Data

Catalin Teodor Popescu

Deepak Merugu

Giao Nguyen

Shiva Shivakumar

(2019)

Download Google Scholar

Abstract

WarpFlow is a fast, interactive querying and processing sys- tem for big data, with a special treatment for petabyte-scale spatio-temporal datasets. It processes and tranforms rich, hierarchical data end-to-end (e.g., Protocol Buffers – a common data format at Google). WarpFlow speeds up three key metrics for data scientists – time-to-first-result, time- to-full-scale-result, and time-to-trained-model for machine learning (e.g., using TensorFlow). In this paper, we describe the architecture and implementation of WarpFlow. We present a custom data storage format optimized for fast, index-based selection of hierarchical data. We also describe a functional, extensible, pipelined query language (with op- erators such as map, filter, aggregate, etc.) that greatly simplifies writing queries on big datasets with hierarchical data.

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

WarpFlow: Exploring Petabytes of Space-Time Data

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

WarpFlow: Exploring Petabytes of Space-Time Data

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities