Xiaozhou Li

Xiaozhou Li

Xiaozhou (Steve) Li has been a software engineer at Google since 2011. Before joining Google, he was a storage systems researcher at HP Labs. His main interests are in distributed computing/systems. He received his PhD in computer science from the University of Texas at Austin. He has published at Communications of the ACM, SIAM Journal on Computing, PODC, SIGCOMM, and WWW (full list).
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Twenty years of Bigtable
    Fabio Baltieri
    Bora Beran
    Igor Bernstein
    Aimee Borda
    Adrian Chan
    Mark D'Andrea
    Artak Dashyan
    Ramesh Dharan
    Gabor Dinnyes
    Mike Dominguez
    dorland .
    Jose Duenas
    Gary Elliott
    Bruno Furtado
    Madison Garcia
    Marçal Garolera Huguet
    Brendan Gleason
    Alexis Hawkins
    Anoshak Irani
    Rohit Jog
    Sudarshan Kadambi
    Vikram Khemka
    Sailesh Krishnamurthy
    Maxim Krivokon
    Bruce Lee
    Tom Magrino
    Matt Maly
    Mark Mangrich
    Douglas McErlean
    Pablo Montes
    Li Moore
    Eduardo Morales
    Greg Morris
    Steve Niemitz
    Gaurav Prabhu Gaonkar
    Jim Rutherford
    Stephen Ryan
    Sho Saha
    Kanoj Sarcar
    Cristina Schmidt
    Andrii Shyshkalov
    Pratibha Suryadevara
    Nick Suttle
    Anvit Tawar
    John Tobin
    Justin Uang
    Phaneendhar Vemuru
    Harendra Verma
    Shitanshu Verma
    Jinghang (Frank) Wang
    Michal Wegorek
    Simon Yau
    Andrius Ziukas
    SIGMOD Companion '26: Companion of the International Conference on Management of Data, ACM (2026), pp. 188-200
    Preview abstract Bigtable is a pioneering and influential non-relational database system. The original Bigtable paper has been widely cited and it inspired and influenced many other systems such as HBase and Cassandra. Since then, Bigtable has continued to grow and has become one of the largest database systems inside Google. In this paper, we tell the journey of Bigtable inside Google for the last twenty years. We present new features added and improvements made to Bigtable, and we share our experience of running this storage system at scale, continually improving all aspects to accommodate the ever-growing demands of users. View details
    Learned Indexes for a Google-scale Disk-based Database
    Hussam Abu-Libdeh
    Alex Beutel
    Lyric Pankaj Doshi
    Tim Klas Kraska
    Chris Olston
    (2020)
    Preview abstract There is great excitement about learned index structures, but understandable skepticism about the practicality of a new method uprooting decades of research on B-Trees. In this paper, we work to remove some of that uncertainty by demonstrating how a learned index can be integrated in a distributed, disk-based database system: Google’s Bigtable. We detail several design decisions we made to integrate learned indexes in Bigtable. Our results show that integrating learned index significantly improves the end-to-end read latency and throughput for Bigtable. View details
    Computing weak consistency in polynomial time
    Wojciech Golab
    Alejandro López-Ortiz
    Naomi Nishimura
    Proceedings of the 2015 ACM Symposium on Principles of Distributed Computing, ACM, New York, NY, USA, pp. 395-404
    Preview abstract The k-atomicity property can be used to describe the consistency of data operations in large distributed storage systems. The weak consistency guarantees offered by such systems are seen as a necessary compromise in view of Brewer's CAP principle. The k-atomicity property requires that every read operation obtains a value that is at most k updates (writes) old, and becomes a useful way to quantify weak consistency if k is treated as a variable that can be computed from a history of operations. Specifically, the value of k quantifies how far the history deviates from Lamport's atomicity property for read/write registers. We address the problem of computing k indirectly by solving the k-atomicity verification problem (k-AV): given a history of read/write operations and a positive integer k, decide whether the history is k-atomic. Gibbons and Korach showed that in general this problem is NP-complete when k = 1, and hence not solvable in polynomial time unless P = NP. In this paper we present two algorithms that solve the k-AV problem for any k >= 2 in special cases. Similarly to known solutions for k = 1 and k = 2, both algorithms assume that all the values written to a given object are distinct. The first algorithm places an additional restriction on the structure of the input history and solves k-AV in O(n^2 + n (k log k) time. The second algorithm does not place any additional restrictions on the input but is efficient only when k is small and when concurrency among write operations is limited. Its time complexity is O(n^2) if both k and our particular measure of write concurrency are bounded by constants. View details
    Eventually consistent: Not what you were expecting?
    Wojciech Golab
    Muntasir R. Rahman
    Alvin AuYoung
    Kimberly Keeton
    Communications of the ACM, 57, no. 3 (2014), pp. 38-44
    On the k-atomicity-verification problem
    Wojciech Golab
    The 33rd International Conference on Distributed Computing Systems, IEEE (2013)
    Preview abstract Modern Internet-scale storage systems often provide weak consistency in exchange for better perfor- mance and resilience. An important weak consistency prop- erty is k-atomicity, which bounds the staleness of values returned by read operations. The k-atomicity-verification problem (or k-AV for short) is the problem of deciding whether a given history of operations is k-atomic. The 1-AV problem is equivalent to verifying atomicity/linearizability, a well-known and solved problem. However, for k ? 2, no polynomial-time k-AV algorithm is known. This paper makes the following contributions towards solving the k-AV problem. First, we present a simple 2- AV algorithm called LBT, which is likely to be efficient (quasilinear) for histories that arise in practice, although it is less efficient (quadratic) in the worst case. Second, we present a more involved 2-AV algorithm called FZF, which runs efficiently (quasilinear) even in the worst case. To our knowledge, these are the first algorithms that solve the 2-AV problem fully. Third, we show that the weighted k-AV problem, a natural extension of the k-AV problem, is NP-complete. View details
    ×