Trading off Accuracy for Speed in PowerDrill

Alex Hall; Alexandru Tudorica; Filip Buruiana; Reimar Hofmann; Silviu-Ionut Ganceanu; Thomas Hofmann

Trading off Accuracy for Speed in PowerDrill

Alex Hall

Alexandru Tudorica

Filip Buruiana

Reimar Hofmann

Silviu-Ionut Ganceanu

Thomas Hofmann

ICDE 2016: International Conference on Data Engineering, Chicago, USA, (Sep 19-20, 2016), World Academy of Science, Engineering and Technology, pp. 2121-2132

Google Scholar

Abstract

In-memory column-stores are becoming more and more widespread, making interactive data analysis feasible for ever growing amounts of data. However, data volume in many big data scenarios still exceeds the available memory and thus requires smart algorithms if interactive response times are desired. PowerDrill [13] is such a system which is used Google-internally for data discovery in log data. In this paper we present results from the “fast approximations” initiative, where we explore possibilities for optimizing performance at the expense of accuracy.

In one of the optimizations we use some of the theory behind Data-Sketches to considerably improve memory efficiency of the data encoding scheme. For particularly expensive fields in our largest datasets this saves a factor of 4.5 when compared to a standard production grade compression algorithm.

We additionally evaluate the effects of using sampling on accuracy and performance. We propose a simple heuristic for estimating the accuracy of individual result-values. We show that annotating result-values as accurate (or not) is essential for the usefulness of intermediate results over sub-samples. Only with these annotations our users can start interpreting intermediate results before final results are available. For a large set of queries this effectively brings down the 95th latency percentile from 30 to 4 seconds.

Research Areas

Algorithms and theory

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Trading off Accuracy for Speed in PowerDrill

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs