Google Research

SageDB: A Learned Database System

  • Tim Kraska
  • Mohammad Alizadeh
  • Alex Beutel
  • Ed H. Chi
  • Jialin Ding
  • Ani Kristo
  • Guillaume Leclerc
  • Samuel Madden
  • Hongzi Mao
  • Vikram Nathan
CIDR (2019)

Abstract

Modern data processing systems are designed to be general purpose, in that they can handle a wide variety of different schemas, data types, and data distributions, and aim to provide efficient access to that data via the use of optimizers and cost models. This general purpose nature results in systems that do not take advantage of the characteristics of the particular application and data of the user. With SageDB we present a vision towards a new type of a data processing system, one which highly specializes to an application through code synthesis and machine learning. By modeling the data distribution, workload, and hardware, SageDB learns the structure of the data and optimal access methods and query plans. These learned models are deeply embedded, through code synthesis, in essentially every component of the database. As such, SageDB presents radical departure from the way database systems are currently developed, raising a host of new problems in databases, machine learning and programming systems.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work