Jump to Content

Software Systems

Delivering Google's products to our users requires computer systems that have a scale previously unknown to the industry. Building on our hardware foundation, we develop technology across the entire systems stack, from operating system device drivers all the way up to multi-site software systems that run on hundreds of thousands of computers. We design, build and operate warehouse-scale computer systems that are deployed across the globe. We build storage systems that scale to exabytes, approach the performance of RAM, and never lose a byte. We design algorithms that transform our understanding of what is possible. Thanks to the distributed systems we provide our developers, they are some of the most productive in the industry. And we write and publish research papers to share what we have learned, and because peer feedback and interaction helps us build better systems that benefit everybody.

Recent Publications

Preview abstract 2022 marked the 50th anniversary of memory safety vulnerabilities, first reported by Anderson et al. Half a century later, we are still dealing with memory safety bugs despite substantial investments to improve memory unsafe languages. Like others', Google’s data and internal vulnerability research show that memory safety bugs are widespread and one of the leading causes of vulnerabilities in memory-unsafe codebases. Those vulnerabilities endanger end users, our industry, and the broader society. At Google, we have decades of experience addressing, at scale, large classes of vulnerabilities that were once similarly prevalent as memory safety issues. Based on this experience we expect that high assurance memory safety can only be achieved via a Secure-by-Design approach centered around comprehensive adoption of languages with rigorous memory safety guarantees. We see no realistic path for an evolution of C++ into a language with rigorous memory safety guarantees that include temporal safety. As a consequence, we are considering a gradual transition of C++ code at Google towards other languages that are memory safe. Given the large volume of pre-existing C++, we believe it is nonetheless necessary to improve the safety of C++ to the extent practicable. We are considering transitioning to a safer C++ subset, augmented with hardware security features like MTE. View details
Dynamic Inference of Likely Symbolic Tensor Shapes in Python Machine Learning Programs
Koushik Sen
International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP) (2024) (to appear)
Preview abstract In machine learning programs, it is often tedious to annotate the dimensions of shapes of various tensors that get created during execution. We present a dynamic likely tensor shape inference analysis that annotates the dimensions of shapes of tensor expressions with symbolic dimension values. Such annotations can be used for understanding the machine learning code written in popular frameworks, such as TensorFlow, PyTorch, JAX, and for finding bugs related to tensor shape mismatch. View details
Grisette: Symbolic Compilation as a Functional Programming Library
Sirui Lu
Grisette: Symbolic Compilation as a Functional Programming Library, ACM (2023) (to appear)
Preview abstract The development of constraint solvers simplified automated reasoning about programs and shifted the engineering burden to implementing symbolic compilation tools that translate programs into efficiently solvable constraints. We describe Grisette, a reusable symbolic evaluation framework for implementing domain-specific symbolic compilers. Grisette evaluates all execution paths and merges their states into a normal form that avoids making guards mutually exclusive. This ordered-guards representation reduces the constraint size 5-fold and the solving time more than 2-fold. Grisette is designed entirely as a library, which sidesteps the complications of lifting the host language into the symbolic domain. Grisette is purely functional, enabling memoization of symbolic compilation as well as monadic integration with host libraries. Grisette is statically typed, which allows catching programming errors at compile time rather than delaying their detection to the constraint solver. We implemented Grisette in Haskell and evaluated it on benchmarks that stress both the symbolic evaluation and constraint solving. View details
PTStore: Lightweight Architectural Support for Page Table Isolation
Wende Tan
Yangyu Chen
Yuan Li
Ying Liu
Jianping Wu
Chao Zhang
2023 60th ACM/IEEE Design Automation Conference (DAC), IEEE, pp. 1-6
Preview abstract Page tables are critical data structures in kernels, serving as the trust base of most mitigation solutions. Their integrity is thus crucial but is often taken for granted. Existing page table protection solutions usually provide insufficient security guarantees, require heavy hardware, or introduce high overheads. In this paper, we present a novel lightweight hardware-software co-design solution, PTStore, consisting of a secure region storing page tables and tokens verifying page table pointers. Evaluation results on FPGA-based prototypes show that PTStore only introduces <0.92% hardware overheads and <0.86% performance overheads, but provides strong security guarantees, showing that PTStore is efficient and effective. View details
High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs
William S. Moses
Ivan R. Ivanov
Jens Domke
Toshio Endo
Johannes Doerfert
Proceedings of the Intl Conference on Principles and Practice of Parallel Programming (PPoPP), ACM (2023) (to appear)
Preview abstract While parallelism remains the main source of performance, architectural implementations and programming models change with each new hardware generation, often leading to costly application re-engineering. Most tools for performance portability require manual and costly application porting to yet another programming model. We propose an alternative approach that automatically translates programs written in one programming model (CUDA), into another (CPU threads) based on Polygeist/MLIR. Our approach includes a representation of parallel constructs that allows conventional compiler transformations to apply transparently and without modification and enables parallelism-specific optimizations. We evaluate our framework by transpiling and optimizing the CUDA Rodinia benchmark suite for a multi-core CPU and achieve a 76\% geomean speedup over handwritten OpenMP code. Further, we show how CUDA kernels from PyTorch can efficiently run and scale on the CPU-only Supercomputer Fugaku without user intervention. Our PyTorch compatibility layer making use of transpiled CUDA PyTorch kernels outperforms the PyTorch CPU native backend by 2.7x. View details
You Only Linearize Once: Tangents Transpose to Gradients
Alexey Radul
Adam Paszke
Matthew Johnson
Dougal Maclaurin
POPL (2023), pp. 1246-1274
Preview abstract Automatic differentiation (AD) is conventionally understood as a family of distinct algorithms, rooted in two "modes" -- forward and reverse -- which are typically presented (and implemented) separately. Can there be only one? Following up on the AD systems developed in the JAX and Dex projects, we formalize a decomposition of reverse-mode AD into (i) forward-mode AD followed by (ii) unzipping the linear and non-linear parts and then (iii) transposition of the linear part. To that end, we define a (substructurally) linear type system that can prove a class of functions are (algebraically) linear. Our main results are that forward-mode AD produces such linear functions, and that we can unzip and transpose any such linear function, conserving cost, size, and linearity. Composing these three transformations recovers reverse-mode AD. This decomposition also sheds light on checkpointing, which emerges naturally from a free choice in unzipping `let` expressions. As a corollary, checkpointing techniques are applicable to general-purpose partial evaluation, not just AD. We hope that our formalization will lead to a deeper understanding of automatic differentiation and that it will simplify implementations, by separating the concerns of differentiation proper from the concerns of gaining efficiency (namely, separating the derivative computation from the act of running it backward). View details