Storing and Querying Tree-Structured Records in Dremel

Foto N Afrati; Dan Delorey; Mosha Pasumansky; Jeffrey D. Ullman

Storing and Querying Tree-Structured Records in Dremel

Foto N Afrati

Dan Delorey

Mosha Pasumansky

Jeffrey D. Ullman

Proceedings of the VLDB Endowment, 7 (2014), pp. 1131-1142

Download Google Scholar

Abstract

In Dremel, data is stored as nested relations. The schema
for a relation is a tree, all of whose nodes are attributes, and whose leaf attributes hold values. We explore filter and aggregate queries that are given in the Dremel dialect of SQL. Complications arise because of repeated attributes, i.e., attributes that are allowed to have more than one value. We focus on the common class of Dremel queries that are processed on column-stored data in a way that results in query processing time that is linear on the size of the relevant data, i.e., data in the columns that participate in the query. We formally define the data model, the query language and the algorithms for query processing in column-stored data. The concepts of repetition context and semi-flattening are introduced here and play a central role in understanding this class of queries and their algorithms.

Research Areas

Information retrieval

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Storing and Querying Tree-Structured Records in Dremel

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs