- Catalin Teodor Popescu
- Deepak Merugu
- Giao Nguyen
- Shiva Shivakumar
Abstract
WarpFlow is a fast, interactive querying and processing sys- tem for big data, with a special treatment for petabyte-scale spatio-temporal datasets. It processes and tranforms rich, hierarchical data end-to-end (e.g., Protocol Buffers – a common data format at Google). WarpFlow speeds up three key metrics for data scientists – time-to-first-result, time- to-full-scale-result, and time-to-trained-model for machine learning (e.g., using TensorFlow). In this paper, we describe the architecture and implementation of WarpFlow. We present a custom data storage format optimized for fast, index-based selection of hierarchical data. We also describe a functional, extensible, pipelined query language (with op- erators such as map, filter, aggregate, etc.) that greatly simplifies writing queries on big datasets with hierarchical data.
Research Areas
Learn more about how we do research
We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work