Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 5 of 5 publications
    Recursive Programs for Document Spanners
    Balder ten Cate
    Benny Kimelfeld
    Liat Peterfreund
    Ronald Fagin
    Proceedings of the 22nd International Conference on Database Theory (ICDT 2019), LIPICS
    Preview abstract A document spanner models a program for Information Extraction (IE) as a function that takes as input a text document (string over a finite alphabet) and produces a relation of spans (intervals in the document) over a predefined schema. A well-studied language for expressing spanners is that of the regular spanners: relational algebra over regex formulas, which are regular expressions with capture variables. Equivalently, the regular spanners are the ones expressible in non-recursive Datalog over regex formulas (which extract relations that constitute the extensional database). This paper explores the expressive power of recursive Datalog over regex formulas. We show that such programs can express precisely the document spanners computable in polynomial time. We compare this expressiveness to known formalisms such as the closure of regex formulas under the relational algebra and string equality. Finally, we extend our study to a recently proposed framework that generalizes both the relational model and the document spanners. View details
    Spanner: Google's Globally-Distributed Database
    Michael Epstein
    Andrew Fikes
    Christopher Frost
    JJ Furman
    Andrey Gubarev
    Christopher Heiser
    Peter Hochschild
    Sebastian Kanthak
    Eugene Kogan
    Hongyi Li
    Sergey Melnik
    David Mwaura
    David Nagle
    Rajesh Rao
    Lindsay Rolig
    Dale Woodford
    Yasushi Saito
    Christopher Taylor
    Michal Szymaniak
    Ruth Wang
    OSDI (2012)
    Preview abstract Spanner is Google's scalable, multi-version, globally-distributed, and synchronously-replicated database. It is the first system to distribute data at global scale and support externally-consistent distributed transactions. This paper describes how Spanner is structured, its feature set, the rationale underlying various design decisions, and a novel time API that exposes clock uncertainty. This API and its implementation are critical to supporting external consistency and a variety of powerful features: non-blocking reads in the past, lock-free read-only transactions, and atomic schema changes, across all of Spanner. View details
    Spanner: Google's Globally Distributed Database
    Michael Epstein
    Andrew Fikes
    Christopher Frost
    J. J. Furman
    Andrey Gubarev
    Christopher Heiser
    Sebastian Kanthak
    Eugene Kogan
    Hongyi Li
    Sergey Melnik
    David Mwaura
    David Nagle
    Rajesh Rao
    Lindsay Rolig
    Yasushi Saito
    Michal Szymaniak
    Christopher Taylor
    Ruth Wang
    Dale Woodford
    ACM Trans. Comput. Syst., 31 (2013), pp. 8
    Preview
    Spanner: Becoming a SQL System
    Nathan Bales
    Nico Bruno
    Brian F. Cooper
    Adam Dickinson
    Andrew Fikes
    Campbell Fraser
    Andrey Gubarev
    Milind Joshi
    Eugene Kogan
    Sergey Melnik
    Rajesh Rao
    Dave Shue
    Chris Taylor
    Marcel van der Holst
    Dale Woodford
    Proc. SIGMOD 2017, pp. 331-343 (to appear)
    Preview abstract Spanner is a globally-distributed data management system that backs hundreds of mission-critical services at Google. Spanner is built on ideas from both the systems and database communities. The first Spanner paper published at OSDI'12 focused on the systems aspects such as scalability, automatic sharding, fault tolerance, consistent replication, external consistency, and wide-area distribution. This paper highlights the database DNA of Spanner. We describe distributed query execution in the presence of resharding, query restarts upon transient failures, range extraction that drives query routing and index seeks, and the improved blockwise-columnar storage format. We touch upon migrating Spanner to the common SQL dialect shared with other systems at Google. View details
    Preview abstract Spanner is Google's highly available global-scale distributed database. It provides strong consistency for all transactions. This combination of availability and consistency over the wide area is generally considered impossible due to the CAP Theorem. We show how Spanner achieves this combination and why it is consistent with CAP. We also explore the role that TrueTime, Google's globally synchronized clock, plays in consistency for reads and especially for snapshots that enable consistent and repeatable analytics. View details