David F. Bacon

David F. Bacon

David F. Bacon leads the design and evolution of the Spanner storage engine (Ressi) along with the exploitation of new hardware technologies in databases. The team is based in Google's New York City office. David received his A.B. from Columbia University in 1985 and his Ph.D. from U.C. Berkeley in 1997.

His prior work includes compilation and run-time systems for object-oriented programming, hardware compilation, and real-time garbage collection. He is a Fellow of the ACM, and has served on the governing boards of ACM SIGPLAN and SIGBED.

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Building on the simplicity and power of declarative queries combined with strongly consistent transactional semantics has allowed the Spanner database to scale to many thousands of machines running an aggregate of over 2 billion queries per second on over 8 exabytes of data. This includes some of the largest applications in the world, serving well over a billion users each. The appetite for database storage continues to grow, potentially reaching zettabyte scale (1 billion terrabytes) by 2030. However, the end of Moore and Dennard scaling mean that the cost of the infrastructure to run those databases could grow much faster than it has in the past. In this talk I will give my perspective on the challenges to reaching zettabyte scale, and the hardware technologies and approaches most (and least) likely to be successful. View details
    Detection and Prevention of Silent Data Corruption in an Exabyte-scale Database System
    The 18th IEEE Workshop on Silicon Errors in Logic – System Effects, IEEE(2022)
    Preview abstract Google’s Spanner database serves multiple exabytes of data at well over a billion queries per second, distributed over a significant fraction of Google’s fleet. Silent data corruption events due to hardware error are detected/prevented by Spanner several times per week. For every detected error there are some number of undetected errors that in rare (but not black swan) events cause corruption either transiently for reads or durably for writes, potentially violating the most fundamental contract that a database system makes with its users: to store and retrieve data with absolute reliability and availability. We describe the work we have done to detect and prevent silent data corruptions and (equally importantly) to remove faulty machines from the fleet, both manually and automatically. We present a simplified analytic model of corruption that provides some insights into the most effective ways to prevent end-user corruption events. We have made qualitative gains in detection and prevention of SDC events, but quantitative analysis remains difficult. We discuss various potential trajectories in hardware (un)reliability and how they will affect our ability to build reliable database systems on commodity hardware. View details
    Spanner: Becoming a SQL System
    Nathan Bales
    Nico Bruno
    Brian F. Cooper
    Adam Dickinson
    Andrew Fikes
    Campbell Fraser
    Andrey Gubarev
    Milind Joshi
    Eugene Kogan
    Sergey Melnik
    Rajesh Rao
    Dave Shue
    Chris Taylor
    Marcel van der Holst
    Dale Woodford
    Proc. SIGMOD 2017, pp. 331-343 (to appear)
    Preview abstract Spanner is a globally-distributed data management system that backs hundreds of mission-critical services at Google. Spanner is built on ideas from both the systems and database communities. The first Spanner paper published at OSDI'12 focused on the systems aspects such as scalability, automatic sharding, fault tolerance, consistent replication, external consistency, and wide-area distribution. This paper highlights the database DNA of Spanner. We describe distributed query execution in the presence of resharding, query restarts upon transient failures, range extraction that drives query routing and index seeks, and the improved blockwise-columnar storage format. We touch upon migrating Spanner to the common SQL dialect shared with other systems at Google. View details
    And then there were none: a stall-free real-time garbage collector for reconfigurable hardware
    Perry Cheng
    Sunil Shukla
    Communications of the ACM, 56(2013), pp. 101-109
    Preview abstract Programmers are turning to radical architectures such as reconfigurable hardware (FPGAs) to achieve performance. But such systems, programmed at a very low level in languages with impoverished abstractions, are orders of magnitude more complex to use than conventional CPUs. The continued exponential increase in transistors, combined with the desire to implement ever more sophisticated algorithms, makes it imperative that such systems be programmed at much higher levels of abstraction. One of the fundamental high-level language features is automatic memory management in the form of garbage collection. We present the first implementation of a complete garbage collector in hardware (as opposed to previous "hardware-assist" techniques), using an FPGA and its on-chip memory. Using a completely concurrent snapshot algorithm, it provides single-cycle access to the heap, and never stalls the mutator for even a single cycle, achieving a deterministic mutator utilization (MMU) of 100%. We have synthesized the collector to hardware and show that it never consumes more than 1% of the logic resources of a high-end FPGA. For comparison we also implemented explicit (malloc/free) memory management, and show that real-time collection is about 4% to 17% slower than malloc, with comparable energy consumption. Surprisingly, in hardware real-time collection is superior to stop-the-world collection on every performance axis, and even for stressful micro-benchmarks can achieve 100% MMU with heaps as small as 1.01 to 1.4 times the absolute minimum. View details
    A real-time garbage collector with low overhead and consistent utilization
    Perry Cheng
    V. T. Rajan
    Proceedings of the 30th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, ACM, New York, NY, USA(2003), pp. 285-298
    Preview abstract Now that the use of garbage collection in languages like Java is becoming widely accepted due to the safety and software engineering benefits it provides, there is significant interest in applying garbage collection to hard real-time systems. Past approaches have generally suffered from one of two major flaws: either they were not provably real-time, or they imposed large space overheads to meet the real-time bounds. We present a mostly non-moving, dynamically defragmenting collector that overcomes both of these limitations: by avoiding copying in most cases, space requirements are kept low; and by fully incrementalizing the collector we are able to meet real-time bounds. We implemented our algorithm in the Jikes RVM and show that at real-time resolution we are able to obtain mutator utilization rates of 45% with only 1.6--2.5 times the actual space required by the application, a factor of 4 improvement in utilization over the best previously published results. Defragmentation causes no more than 4% of the traced data to be copied. View details
    Thin locks: Featherweight synchronization for Java
    Ravi Konuru
    Chet Murthy
    Mauricio Serrano
    Proc. ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation, ACM, New York, NY, USA, pp. 258-268
    Preview abstract Language-supported synchronization is a source of serious performance problems in many Java programs. Even single-threaded applications may spend up to half their time performing useless synchronization due to the thread-safe nature of the Java libraries. We solve this performance problem with a new algorithm that allows lock and unlock operations to be performed with only a few machine instructions in the most common cases. Our locks only require a partial word per object, and were implemented without increasing object size. We present measurements from our implementation in the JDK 1.1.2 for AIX, demonstrating speedups of up to a factor of 5 in micro-benchmarks and up to a factor of 1.7 in real programs. View details
    Fast static analysis of C++ virtual function calls
    Peter F. Sweeney
    Proc. 11th ACM Conference on Object-oriented Programming, Systems, Languages, and Applications, ACM, New York, NY, USA(1996), pp. 324-341
    Preview abstract Virtual functions make code easier for programmers to reuse but also make it harder for compilers to analyze. We investigate the ability of three static analysis algorithms to improve C++ programs by resolving virtual function calls, thereby reducing compiled code size and reducing program complexity so as to improve both human and automated program understanding and analysis. In measurements of seven programs of significant size (5000 to 20000 lines of code each) we found that on average the most precise of the three algorithms resolved 71% of the virtual function calls and reduced compiled code size by 25%. This algorithm is very fast: it analyzes 3300 source lines per second on an 80 MHz PowerPC 601. Because of its accuracy and speed, this algorithm is an excellent candidate for inclusion in production C++ compilers. View details
    Compiler transformations for high-performance computing
    Susan L. Graham
    Oliver J. Sharp
    ACM Computing Surveys, 26(1994), pp. 345-420
    Preview abstract In the last three decades a large number of compiler transformations for optimizing programs have been implemented. Most optimizations for uniprocessors reduce the number of instructions executed by the program using transformations based on the analysis of scalar quantities and data-flow techniques. In contrast, optimizations for high-performance superscalar, vector, and parallel processors maximize parallelism and memory locality with transformations that rely on tracking the properties of arrays using loop dependence analysis.This survey is a comprehensive overview of the important high-level program restructuring techniques for imperative languages, such as C and Fortran. Transformations for both sequential and various types of parallel architectures are covered in depth. We describe the purpose of each transformation, explain how to determine if it is legal, and give an example of its application.Programmers wishing to enhance the performance of their code can use this survey to improve their understanding of the optimizations that compilers can perform, or as a reference for techniques to be applied manually. Students can obtain an overview of optimizing compiler technology. Compiler writers can use this survey as a reference for most of the important optimizations developed to date, and as bibliographic reference for the details of each optimization. Readers are expected to be familiar with modern computer architecture and basic program compilation techniques. View details