Urs Hölzle

Urs Hölzle

Urs Hölzle is a Google Fellow in Google Cloud. Until 2023 he was the Senior Vice President for Technical Infrastructure at Google. In this capacity Urs oversaw the design, installation, and operation of the servers, networks, and datacenters that power Google's services. Through efficiency innovations, Urs and his team have reduced the energy used by Google data centers to less than 50% of the industry average. Urs is renowned for both his red socks and his free-range Leonberger, Yoshka (Google's top dog).

Urs grew up in Switzerland and received a master's degree in computer science from ETH Zurich and, as a Fulbright scholar, a Ph.D. from Stanford. While at Stanford (and then a small start-up that was later acquired by Sun Microsystems) he invented fundamental techniques used in most of today's leading Java compilers. Before joining Google he was a professor of computer science at the University of California, Santa Barbara. He is a Fellow of the ACM and a member of the US National Academy of Engineering and the Swiss Academy of Technical Sciences.

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network
    Joon Ong
    Amit Agarwal
    Glen Anderson
    Ashby Armistead
    Roy Bannon
    Seb Boving
    Gaurav Desai
    Bob Felderman
    Paulie Germano
    Anand Kanagala
    Jeff Provost
    Jason Simmons
    Eiichi Tanda
    Jim Wanderer
    Stephen Stuart
    Communications of the ACM, Vol. 59, No. 9 (2016), pp. 88-97
    Preview abstract We present our approach for overcoming the cost, operational complexity, and limited scale endemic to datacenter networks a decade ago. Three themes unify the five generations of datacenter networks detailed in this paper. First, multi-stage Clos topologies built from commodity switch silicon can support cost-effective deployment of building-scale networks. Second, much of the general, but complex, decentralized network routing and management protocols supporting arbitrary deployment scenarios were overkill for single-operator, pre-planned datacenter networks. We built a centralized control mechanism based on a global configuration pushed to all datacenter switches. Third, modular hardware design coupled with simple, robust software allowed our design to also support inter-cluster and wide-area networks. Our datacenter networks run at dozens of sites across the planet, scaling in capacity by 100x over 10 years to more than 1 Pbps of bisection bandwidth. View details
    Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network
    Joon Ong
    Amit Agarwal
    Glen Anderson
    Ashby Armistead
    Roy Bannon
    Seb Boving
    Gaurav Desai
    Paulie Germano
    Jeff Provost
    Jason Simmons
    Eiichi Tanda
    Jim Wanderer
    Amin Vahdat
    Sigcomm '15, Google Inc (2015)
    Preview abstract We present our approach for overcoming the cost, operational complexity, and limited scale endemic to datacenter networks a decade ago. Three themes unify the five generations of datacenter networks detailed in this paper. First, multi-stage Clos topologies built from commodity switch silicon can support cost-effective deployment of building-scale networks. Second, much of the general, but complex, decentralized network routing and management protocols supporting arbitrary deployment scenarios were overkill for single-operator, pre-planned datacenter networks. We built a centralized control mechanism based on a global configuration pushed to all datacenter switches. Third, modular hardware design coupled with simple, robust software allowed our design to also support inter-cluster and wide-area networks. Our datacenter networks run at dozens of sites across the planet, scaling in capacity by 100x over ten years to more than 1Pbps of bisection bandwidth. View details
    B4: Experience with a Globally Deployed Software Defined WAN
    Sushant Jain
    Joon Ong
    Subbaiah Venkata
    Jim Wanderer
    Junlan Zhou
    Min Zhu
    Amin Vahdat
    Proceedings of the ACM SIGCOMM Conference, Hong Kong, China (2013)
    Preview
    Preview abstract As computation continues to move into the cloud, the computing platform of interest no longer resembles a pizza box or a refrigerator, but a warehouse full of computers. These new large datacenters are quite different from traditional hosting facilities of earlier times and cannot be viewed simply as a collection of co-located servers. Large portions of the hardware and software resources in these facilities must work in concert to efficiently deliver good levels of Internet service performance, something that can only be achieved by a holistic approach to their design and deployment. In other words, we must treat the datacenter itself as one massive warehouse-scale computer (WSC). We describe the architecture of WSCs, the main factors influencing their design, operation, and cost structure, and the characteristics of their software base. We hope it will be useful to architects and programmers of today’s WSCs, as well as those of future many-core platforms which may one day implement the equivalent of today’s WSCs on a single board. Notes for the Second Edition After nearly four years of substantial academic and industrial developments in warehouse-scale computing, we are delighted to present our first major update to this lecture. The increased popularity of public clouds has made WSC software techniques relevant to a larger pool of programmers since our first edition. Therefore, we expanded Chapter 2 to reflect our better understanding of WSC software systems and the toolbox of software techniques for WSC programming. In Chapter 3, we added to our coverage of the evolving landscape of wimpy vs. brawny server trade-offs, and we now present an overview of WSC interconnects and storage systems that was promised but lacking in the original edition. Thanks largely to the help of our new co-author, Google Distinguished Engineer Jimmy Clidaras, the material on facility mechanical and power distribution design has been updated and greatly extended (see Chapters 4 and 5). Chapters 6 and 7 have also been revamped significantly. We hope this revised edition continues to meet the needs of educators and professionals in this area. View details
    Preview abstract As computation continues to move into the cloud, the computing platform of interest no longer resembles a pizza box or a refrigerator, but a warehouse full of computers. These new large datacenters are quite different from traditional hosting facilities of earlier times and cannot be viewed simply as a collection of co-located servers. Large portions of the hardware and software resources in these facilities must work in concert to efficiently deliver good levels of Internet service performance, something that can only be achieved by a holistic approach to their design and deployment. In other words, we must treat the datacenter itself as one massive warehouse-scale computer (WSC). We describe the architecture of WSCs, the main factors influencing their design, operation, and cost structure, and the characteristics of their software base. We hope it will be useful to architects and programmers of today's WSCs, as well as those of future many-core platforms which may one day implement the equivalent of today's WSCs on a single board. View details
    The Case for Energy-Proportional Computing
    Luiz André Barroso
    IEEE Computer, 40 (2007)
    Preview abstract In current servers, the lowest energy-efficiency region corresponds to their most common operating mode. Addressing this perfect mismatch will require significant rethinking of components and systems. To that end, we propose that energy proportionality should become a primary design goal. Energy-proportional designs would enable large energy savings in servers, potentially doubling their efficiency in real-life use. Achieving energy proportionality will require significant improvements in the energy usage profile of every system component, particularly the memory and disk subsystems. Although our experience in the server space motivates these observations, we believe that energy-proportional computing also will benefit other types of computing devices. View details
    Preview abstract The focus of our message is efficiency: power efficiency and programming efficiency. There are several hard technical problems surrounding power efficiency of computers, but we've found one that is actually not particularly challenging and could have a huge impact on the energy used by home computers and low-end servers: increasing power supply efficiency. View details
    Monkey See, Monkey Do: A Tool for TCP Tracing and Replaying
    Stefan Savage
    Geoffrey M. Voelker
    USENIX Annual Technical Conference, General Track (2004)
    Preview
    Preview abstract Amenable to extensive parallelization, Google's Web search application lets different queries run on different processors and, by partitioning the overall index, also lets a single query use multiple processors. To handle this workload, Google's architecture features clusters of more than 15,000 commodity class PCs with fault-tolerant software. This architecture achieves superior performance at a fraction of the cost of a system built from fewer, but more expensive, high-end servers. View details