Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Sort By
1 - 5 of 5 publications
Recursive Programs for Document Spanners
Balder ten Cate
Benny Kimelfeld
Liat Peterfreund
Ronald Fagin
Proceedings of the 22nd International Conference on Database Theory (ICDT 2019), LIPICS
Preview abstract
A document spanner models a program for Information Extraction (IE) as a function that takes as input a text document (string over a finite alphabet) and produces a relation of spans (intervals in the document) over a predefined schema. A well-studied language for expressing spanners is that of the regular spanners: relational algebra over regex formulas, which are regular expressions with capture variables. Equivalently, the regular spanners are the ones expressible in non-recursive Datalog over regex formulas (which extract relations that constitute the extensional database). This paper explores the expressive power of recursive Datalog over regex formulas. We show that such programs can express precisely the document spanners computable in polynomial time. We compare this expressiveness to known formalisms such as the closure of regex formulas under the relational algebra and string equality. Finally, we extend our study to a recently proposed framework that generalizes both the relational model and the document spanners.
View details
Spanner: Google's Globally-Distributed Database
Michael Epstein
Andrew Fikes
Christopher Frost
JJ Furman
Andrey Gubarev
Christopher Heiser
Peter Hochschild
Sebastian Kanthak
Eugene Kogan
Hongyi Li
Sergey Melnik
David Mwaura
David Nagle
Rajesh Rao
Lindsay Rolig
Dale Woodford
Yasushi Saito
Christopher Taylor
Michal Szymaniak
Ruth Wang
OSDI (2012)
Preview abstract
Spanner is Google's scalable, multi-version, globally-distributed, and synchronously-replicated database. It is the first system to distribute data at global scale and support externally-consistent distributed transactions. This paper describes how Spanner is structured, its feature set, the rationale underlying various design decisions, and a novel time API that exposes clock uncertainty. This API and its implementation are critical to supporting external consistency and a variety of powerful features: non-blocking reads in the past, lock-free read-only transactions, and atomic schema changes, across all of Spanner.
View details
Spanner: Google's Globally Distributed Database
Preview
Michael Epstein
Andrew Fikes
Christopher Frost
J. J. Furman
Andrey Gubarev
Christopher Heiser
Sebastian Kanthak
Eugene Kogan
Hongyi Li
Sergey Melnik
David Mwaura
David Nagle
Rajesh Rao
Lindsay Rolig
Yasushi Saito
Michal Szymaniak
Christopher Taylor
Ruth Wang
Dale Woodford
ACM Trans. Comput. Syst., 31 (2013), pp. 8
Spanner: Becoming a SQL System
Nathan Bales
Nico Bruno
Brian F. Cooper
Adam Dickinson
Andrew Fikes
Campbell Fraser
Andrey Gubarev
Milind Joshi
Eugene Kogan
Sergey Melnik
Rajesh Rao
Dave Shue
Chris Taylor
Marcel van der Holst
Dale Woodford
Proc. SIGMOD 2017, pp. 331-343 (to appear)
Preview abstract
Spanner is a globally-distributed data management system that backs hundreds of mission-critical services at Google. Spanner is built on ideas from both the systems and database communities. The first Spanner paper published at OSDI'12 focused on the systems aspects such as scalability, automatic sharding, fault tolerance, consistent replication, external consistency, and wide-area distribution. This paper highlights the database DNA of Spanner. We describe distributed query execution in the presence of resharding, query restarts upon transient failures, range extraction that drives query routing and index seeks, and the improved blockwise-columnar storage format. We touch upon migrating Spanner to the common SQL dialect shared with other systems at Google.
View details
Spanner, TrueTime and the CAP Theorem
Google (2017)
Preview abstract
Spanner is Google's highly available global-scale distributed database. It provides strong consistency for all transactions. This combination of availability and consistency over the wide area is generally considered impossible due to the CAP Theorem. We show how Spanner achieves this combination and why it is consistent with CAP. We also explore the role that TrueTime, Google's globally synchronized clock, plays in consistency for reads and especially for snapshots that enable consistent and repeatable analytics.
View details