Luiz André Barroso

Luiz André Barroso

Luiz André Barroso was a Google Fellow. Over his more than two decades at Google, he worked as a VP of Engineering in the Core and Maps teams, and as a technical leader in areas such as Google Search and the design of Google’s computing platform. Luiz published several technical papers and co-authored "The Datacenter as a Computer", the first textbook to describe the architecture of warehouse-scale computing systems, now in its 3rd edition. Luiz was a Fellow of the ACM and the American Association for the Advancement of Science, a member of the National Academy of Engineering, the American Academy of Arts & Sciences, and a recipient of the 2020 ACM/IEEE Computer Eckert-Mauchly Award. He held B.S. and M.S. degrees in Electrical Engineering from the Pontifícia Universidade Católica of Rio de Janeiro and a Ph.D. in Computer Engineering from the University of Southern California.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    A Brief History of Warehouse-Scale Computing
    Luiz André Barroso
    IEEE Micro, 41(02) (2021), pp. 78-83
    Preview abstract Receiving the 2020 ACM-IEEE Eckert-Mauchly Award this past June was among the most rewarding experiences of my career. I am grateful to IEEE Micro for giving me the opportunity to share here the story behind the work that led to this award, a short version of my professional journey so far, as well as a few things I learned along the way. View details
    The Datacenter as a Computer: designing warehouse-scale machines
    Luiz André Barroso
    Urs Hölzle
    Parthasarathy Ranganathan
    Morgan & Claypool Publishers (2018)
    Preview abstract This book describes warehouse-scale computers (WSCs), the computing platforms that power cloud computing and all the great web services we use every day. It discusses how these new systems treat the datacenter itself as one massive computer designed at warehouse scale, with hardware and software working in concert to deliver good levels of internet service performance. The book details the architecture of WSCs and covers the main factors influencing their design, operation, and cost structure, and the characteristics of their software base. Each chapter contains multiple real-world examples, including detailed case studies and previously unpublished details of the infrastructure used to power Google’s online services. Targeted at the architects and programmers of today’s WSCs, this book provides a great foundation for those looking to innovate in this fascinating and important area, but the material will also be broadly interesting to those who just want to understand the infrastructure powering the internet. The third edition reflects four years of advancements since the previous edition and nearly doubles the number of pictures and figures. New topics range from additional workloads like video streaming, machine learning, and public cloud to specialized silicon accelerators, storage and network building blocks, and a revised discussion of data center power and cooling, and uptime. Further discussions of emerging trends and opportunities ensure that this revised edition will remain an essential resource for educators and professionals working on the next generation of WSCs. View details
    Attack of the killer microseconds
    Luiz André Barroso
    Mike Marty
    David Patterson
    Parthasarathy Ranganathan
    Communications of the ACM, 60(4) (2017), pp. 48-54
    Preview abstract The computer systems we use today make it easy for programmers to mitigate event latencies in the nanosecond and millisecond time scales (such as DRAM accesses at tens or hundreds of nanoseconds and disk I/Os at a few milliseconds) but lack meaningful support for microsecond (μs)-scale events. This oversight is quickly becoming a serious problem for programming warehouse-scale computers, where efficient handling of microsecond-scale events is becoming paramount for a new breed of low-latency I/O devices ranging from datacenter networking to computing accelerators. View details
    Towards Energy Proportionality for Large-Scale Latency-Critical Workloads
    Rama Govindaraju
    Luiz André Barroso
    Christos Kozyrakis
    Proceedings of the 41th Annual International Symposium on Computer Architecture, ACM (2014)
    Preview abstract Reducing the energy footprint of warehouse-scale computer (WSC) systems is key to their affordability, yet difficult to achieve in practice. The lack of energy proportionality of typical WSC hardware and the fact that important workloads (such as search) require all servers to remain up regardless of traffic intensity renders existing power management techniques ineffective at reducing WSC energy use. We present PEGASUS, a feedback-based controller that significantly improves the energy proportionality of WSC systems, as demonstrated by a real implementation in a Google search cluster. PEGASUS uses request latency statistics to dynamically adjust server power management limits in a fine-grain manner, running each server just fast enough to meet global service-level latency objectives. In large cluster experiments, PEGASUS reduces power consumption by up to 20%. We also estimate that a distributed version of PEGASUS can nearly double these savings. View details
    Preview abstract As computation continues to move into the cloud, the computing platform of interest no longer resembles a pizza box or a refrigerator, but a warehouse full of computers. These new large datacenters are quite different from traditional hosting facilities of earlier times and cannot be viewed simply as a collection of co-located servers. Large portions of the hardware and software resources in these facilities must work in concert to efficiently deliver good levels of Internet service performance, something that can only be achieved by a holistic approach to their design and deployment. In other words, we must treat the datacenter itself as one massive warehouse-scale computer (WSC). We describe the architecture of WSCs, the main factors influencing their design, operation, and cost structure, and the characteristics of their software base. We hope it will be useful to architects and programmers of today’s WSCs, as well as those of future many-core platforms which may one day implement the equivalent of today’s WSCs on a single board. Notes for the Second Edition After nearly four years of substantial academic and industrial developments in warehouse-scale computing, we are delighted to present our first major update to this lecture. The increased popularity of public clouds has made WSC software techniques relevant to a larger pool of programmers since our first edition. Therefore, we expanded Chapter 2 to reflect our better understanding of WSC software systems and the toolbox of software techniques for WSC programming. In Chapter 3, we added to our coverage of the evolving landscape of wimpy vs. brawny server trade-offs, and we now present an overview of WSC interconnects and storage systems that was promised but lacking in the original edition. Thanks largely to the help of our new co-author, Google Distinguished Engineer Jimmy Clidaras, the material on facility mechanical and power distribution design has been updated and greatly extended (see Chapters 4 and 5). Chapters 6 and 7 have also been revamped significantly. We hope this revised edition continues to meet the needs of educators and professionals in this area. View details
    The Tail at Scale
    Luiz André Barroso
    Communications of the ACM, 56 (2013), pp. 74-80
    Preview abstract Systems that respond to user actions very quickly (within 100 milliseconds) feel more fluid and natural to users than those that take longer [Card et al 1991]. Improvements in Internet connectivity and the rise of warehouse-scale computing systems [Barroso & Hoelzle 2009] have enabled Web services that provide fluid responsiveness while consulting multi-terabyte datasets that span thousands of servers. For example, the Google search system now updates query results interactively as the user types, predicting the most likely query based on the prefix typed so far, performing the search, and showing the results within a few tens of milliseconds. Emerging augmented reality devices such as the Google Glass prototype will need associated Web services with even greater computational needs while guaranteeing seamless interactivity. It is challenging to keep the tail of the latency distribution low for interactive services as the size and complexity of the system scales up or as overall utilization increases. Temporary high latency episodes which are unimportant in moderate size systems may come to dominate overall service performance at large scale. Just as fault-tolerant computing aims to create a reliable whole out of less reliable parts, we suggest that large online services need to create a predictably responsive whole out of less predictable parts. We refer to such systems as latency tail-tolerant, or tail-tolerant for brevity. This article outlines some of the common causes of high latency episodes in large online services and describes techniques that reduce their severity or mitigate their impact in whole system performance. In many cases, tail-tolerant techniques can take advantage of resources already deployed to achieve fault-tolerance, resulting in low additional overheads. We show that these techniques allow system utilization to be driven higher without lengthening the latency tail, avoiding wasteful over-provisioning. View details
    Power Management of Online Data-Intensive Services
    David Meisner
    Christopher M. Sadler
    Luiz André Barroso
    Thomas F. Wenisch
    Proceedings of the 38th ACM International Symposium on Computer Architecture (2011)
    Preview abstract Much of the success of the Internet services model can be attributed to the popularity of a class of workloads that we call Online Data-Intensive (OLDI) services. These workloads perform significant computing over massive data sets per user request but, unlike their offline counterparts (such as MapReduce computations), they require responsiveness in the sub-second time scale at high request rates. Large search products, online advertising, and machine translation are examples of workloads in this class. Although the load in OLDI services can vary widely during the day, their energy consumption sees little variance due to the lack of energy proportionality of the underlying machinery. The scale and latency sensitivity of OLDI workloads also make them a challenging target for power management techniques. We investigate what, if anything, can be done to make OLDI systems more energy-proportional. Specifically, we evaluate the applicability of active and idle low-power modes to reduce the power consumed by the primary server components (processor, memory, and disk), while maintaining tight response time constraints, particularly on 95th-percentile latency. Using Web search as a representative example of this workload class, we first characterize a production Web search workload at cluster-wide scale. We provide a fine-grain characterization and expose the opportunity for power savings using low-power modes of each primary server component. Second, we develop and validate a performance model to evaluate the impact of processor- and memory-based low-power modes on the search latency distribution and consider the benefit of current and foreseeable low-power modes. Our results highlight the challenges of power management for this class of workloads. In contrast to other server workloads, for which idle low-power modes have shown great promise, for OLDI workloads we find that energy-proportionality with acceptable query latency can only be achieved using coordinated, full-system active low-power modes. View details
    The Future of Computing Performance: Game Over or Next Level?
    Samuel H. Fuller
    Luiz André Barroso
    Robert P. Colwell
    William J. Dally
    Dan Dobberpuhl
    Pradeep Dubey
    Mark D. Hill
    Mark Horowitz
    David Kirk
    Monica Lam
    Kathryn S. McKinley
    Charles Moore
    Katherine Yelick
    The National Academies Press (2011), pp. 200
    Preview abstract The end of dramatic exponential growth in single-processor performance marks the end of the dominance of the single microprocessor in computing. The era of sequential computing must give way to a new era in which parallelism is at the forefront. Although important scientific and engineering challenges lie ahead, this is an opportune time for innovation in programming systems and computing architectures. We have already begun to see diversity in computer designs to optimize for such considerations as power and throughput. The next generation of discoveries is likely to require advances at both the hardware and software levels of computing systems. There is no guarantee that we can make parallel computing as common and easy to use as yesterday's sequential single-processor computer systems, but unless we aggressively pursue efforts suggested by the recommendations in this book, it will be "game over" for growth in computing performance. If parallel programming and related software efforts fail to become widespread, the development of exciting new applications that drive the computer industry will stall; if such innovation stalls, many other parts of the economy will follow suit. The Future of Computing Performance describes the factors that have led to the future limitations on growth for single processors that are based on complementary metal oxide semiconductor (CMOS) technology. It explores challenges inherent in parallel computing and architecture, including ever-increasing power consumption and the escalated requirements for heat dissipation. The book delineates a research, practice, and education agenda to help overcome these challenges. The Future of Computing Performance will guide researchers, manufacturers, and information technology professionals in the right direction for sustainable growth in computer performance, so that we may all enjoy the next level of benefits to society. View details
    Warehouse-scale Computing: entering the teenage decade
    Luiz André Barroso
    Association for Computing Machinery (2011)
    Preview abstract Video recording of a plenary talk delivered at the 2011 ACM Federated Computing Research Conference, focusing on some important challenges awaiting programmers and designers of Warehouse-scale Computers as it enters its second decade. June 8, 2011, San Jose, CA. View details
    FAWN: a fast array of wimpy nodes: technical perspective
    Luiz André Barroso
    Communications of the ACM, 54 (2011), pp. 100-100
    Preview