Jump to Content

Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Publications

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 318 publications
    On the Benefits of Traffic “Reprofiling” The Single Hop Case
    Henry Sariowan
    Jiaming Qiu
    Jiayi Song
    Roch Guerin
    IEEE/ACM Transactions on Networking (2024)
    Preview abstract Datacenters have become a significant source of traffic, much of which is carried over private networks. The operators of those networks commonly have access to detailed traffic profiles and performance goals, which they seek to meet as efficiently as possible. Of interest are solutions that guarantee latency while minimizing network bandwidth. The paper explores a basic building block towards realizing such solutions, namely, a single hop configuration. The main results are in the form of optimal solutions for meeting local deadlines under schedulers of varying complexity and therefore cost. The results demonstrate how judiciously modifying flows’ traffic profiles, i.e., reprofiling them, can help simple schedulers reduce the bandwidth they require, often performing nearly as well as more complex ones. View details
    Preview abstract This is an invited OFC 2024 conference workshop talk regarding a new type of lower-power datacenter optics design choice: linear pluggable optics. In this talk I will discuss the fundamental performance constraints facing linear pluggable optics and their implications on DCN and ML use cases View details
    Preview abstract Three prominent traffic features including peak alignment, stable ranking, and gravity model, have guided the design of current Google Jupiter fabrics in traffic engineering, topology engineering, and capacity planning. View details
    Change Management in Physical Network Lifecycle Automation
    Virginia Beauregard
    Kevin Grant
    Angus Griffith
    Jahangir Hasan
    Chen Huang
    Quan Leng
    Jiayao Li
    Alexander Lin
    Zhoutao Liu
    Ahmed Mansy
    Bill Martinusen
    Nikil Mehta
    Andrew Narver
    Anshul Nigham
    Melanie Obenberger
    Sean Smith
    Kurt Steinkraus
    Sheng Sun
    Edward Thiele
    Proc. 2023 USENIX Annual Technical Conference (USENIX ATC 23)
    Preview abstract Automated management of a physical network's lifecycle is critical for large networks. At Google, we manage network design, construction, evolution, and management via multiple automated systems. In our experience, one of the primary challenges is to reliably and efficiently manage change in this domain -- additions of new hardware and connectivity, planning and sequencing of topology mutations, introduction of new architectures, new software systems and fixes to old ones, etc. We especially have learned the importance of supporting multiple kinds of change in parallel without conflicts or mistakes (which cause outages) while also maintaining parallelism between different teams and between different processes. We now know that this requires automated support. This paper describes some of our network lifecycle goals, the automation we have developed to meet those goals, and the change-management challenges we encountered. We then discuss in detail our approaches to several specific kinds of change management: (1) managing conflicts between multiple operations on the same network; (2) managing conflicts between operations spanning the boundaries between networks; (3) managing representational changes in the models that drive our automated systems. These approaches combine both novel software systems and software-engineering practices. While this paper reports on our experience with large-scale datacenter network infrastructures, we are also applying the same tools and practices in several adjacent domains, such as the management of WAN systems, of machines, and of datacenter physical designs. Our approaches are likely to be useful at smaller scales, too. View details
    Preview abstract We introduce logical synchrony, a framework that allows distributed computing to be coordinated as tightly as with pure synchrony without the distribution of a global clock or any reference to a universal time. We describe and prove the main properties of the framework and point to how processes can be executed on a logically synchronous system. View details
    CAPA: An Architecture For Operating Cluster Networks With High Availability
    Bingzhe Liu
    Mukarram Tariq
    Omid Alipourfard
    Rich Alimi
    Deepak Arulkannan
    Virginia Beauregard
    Patrick Conner
    Brighten Godfrey
    Xander Lin
    Mayur Patel
    Joon Ong
    Amr Sabaa
    Alex Smirnov
    Manish Verma
    Prerepa Viswanadham
    Google, Google, 1600 Amphitheatre Pkwy, Mountain View, CA 94043 (2023)
    Preview abstract Management operations are a major source of outages for networks. A number of best practices designed to reduce and mitigate such outages are well known, but their enforcement has been challenging, leaving the network vulnerable to inadvertent mistakes and gaps which repeatedly result in outages. We present our experiences with CAPA, Google’s “containment and prevention architecture” for regulating management operations on our cluster networking fleet. Our goal with CAPA is to limit the systems where strict adherence to best practices is required, so that availability of the network is not dependent on the good intentions of every engineer and operator. We enumerate the features of CAPA which we have found to be necessary to effectively enforce best practices within a thin “regulation“ layer. We evaluate CAPA based on case studies of outages prevented, counterfactual analysis of past incidents, and known limitations. Management-plane-related outages have substantially reduced both in frequency and severity, with a 82% reduction in cumulative duration of incidents normalized to fleet size over five years View details
    Preview abstract Bolt is a congestion-control algorithm designed to providesingle-digit microsecond tail network-queuing at near-linerate utilization. Motivated by the need for ultra-low latencyto support applications such as NVMe, as line rates reach200G and beyond, most transfers fit within a single BDP en-tailing that transfer times predominantly become a functionof queuing and propagation delays. Bolt is an attempt topush congestion-control to its theoretical limits by harness-ing the power of programmable dataplanes such as Tofinoand Trident3+ chips. Bolt is founded on three key ideas, (i)Sub-RTT reaction (SRR): reacting to congestion faster thanRTT control-loop delay, (ii) Proactive Ramp-up (PRU): bytracking future flow-completions, and (iii) Supply matching(SM): leveraging Network Calculus concepts to maximizeutilization. Our current results achieve a 75% reduction inqueuing-delays over Swift with upto 3x improvement incompletion times for short transfers. View details
    Preview abstract We review state-of-the-art datacenter technologies for 800G, 1.6T and beyond interconnect speeds, focusing on 200G per-lane IM-DD (intensity modulated-direct detect) and 800G-LR1 coherent-lite transmissions. View details
    Improving Network Availability with Protective ReRoute
    Abdul Kabbani
    Van Jacobson
    Jim Winget
    Brad Morrey
    Uma Parthavi Moravapalle
    Steven Knight
    SIGCOMM 2023
    Preview abstract We present PRR (Protective ReRoute), a transport technique for shortening user-visible outages that complements routing repair. It can be added to any transport to provide benefits in multipath networks. PRR responds to flow connectivity failure signals, e.g., retransmission timeouts, by changing the FlowLabel on packets of the flow, which causes switches and hosts to choose a different network path that may avoid the outage. To enable it, we shifted our IPv6 network architecture to use the FlowLabel, so that hosts can change the paths of their flows without application involvement. PRR is deployed fleetwide at Google for TCP and Pony Express, where it has been protecting all production traffic for several years. It is also available to our Cloud customers. We find it highly effective for real outages. In a measurement study on our network backbones, adding PRR reduced the cumulative region-pair outage time for RPC traffic by 63--84%. This is the equivalent of adding 0.4--0.8 "nines'" of availability. View details
    Preview abstract The difficulty in gaining visibility into the fine-time scale hop-level congestion state of networks has been a key challenge faced by congestion control protocols for decades. How-ever, the emergence of commodity switches supporting in-network telemetry (INT) enables more advanced congestion control. In this paper, we presentPoseidon, a novel congestion control protocol that exploits INT to address blind spots of end-to-end algorithms and realize several fundamentally advantageous properties. Specifically, Poseidon realizes congestion control for the actual bottleneck hop. In the steady state,Poseidon realizes network-wide max-min fair bandwidth al-location. Furthermore, Poseidon decouples the bandwidth fairness requirement from the traditional AIMD control law, making it possible for Poseidon to converge fast and smooth out bandwidth oscillations. Equally important, Poseidon is de-signed to be amenable to incremental brownfield deployment in networks that mix INT and non-INT switches. Our testbed and simulation experiments show that compared to a widely-deployed state-of-the-art non-INT protocol, Swift, Poseidon improves op latency up to 10x in some percentiles (61% in average), lowers fabric RTT by more than 50%, reduces congestion window ramp up time by 40% while decreasing the throughput variation for flows with small windows by 94%.Finally, it is robust to reverse-path and multi-hop congestion. View details
    Selfish Routing and Link Scheduling in mmWave Backhaul Networks
    Dionysia Triantafyllopoulou
    Klaus Moessner
    IEEE ICC (2023)
    Preview abstract In this paper we present and evaluate the performance of a routing and link scheduling algorithm for millimeter wave (mmWave) backhaul networks. The proposed algorithm models the end user behavior as being selfish, i.e., it considers users always aiming to maximize their individual utility, rather than the global optimization objective. Our system utilizes popular concepts from the economics and fairness literature. Specifically, in order to forward packets between the access points that comprise the backhaul network the Shapley value method is applied, which is shown to induce solutions with reduced latency. The performance of the proposed algorithm is evaluated in terms of the total delay, as well as the price of anarchy, which represents the inefficiency of a scheduling policy when users are allowed to adapt their rates in a selfish manner and reach an equilibrium. A relaxed version of the problem is also presented, which provides a lower bound on the value of the optimal solution. This is used for the calculation of the price of anarchy, since the problem of finding the optimal solution is NP-hard. According to simulation results, the system that employs the proposed algorithm outperforms in terms of delay and price of anarchy a system that considers a First-In-First-Out (FIFO) packet forwarding policy, as well as a system that employs local search global optimization, under which users aim at optimizing the overall delay in the network. View details
    RFC 9476 - The .alt Special-Use Top-Level Domain
    Paul Hoffman
    IETF Request For Comments, RFC Editor (2023), pp. 7
    Preview abstract This document reserves a Top-Level Domain (TLD) label "alt" to be used in non-DNS contexts. It also provides advice and guidance to developers creating alternative namespaces. View details
    Preview abstract We describe our experience with Fathom, a system for identifying the network performance bottlenecks of any service running in the Google fleet. Fathom passively samples RPCs, the principal unit of work for services. It segments the overall latency into host and network components with kernel and RPC stack instrumentation. It records these detailed latency metrics, along with detailed transport connection state, for every sampled RPC. This lets us determine if the completion is constrained by the client, network or server. To scale while enabling analysis, we also aggregate samples into distributions that retain multi-dimensional breakdowns. This provides us with a macroscopic view of individual services. Fathom runs globally in our datacenters for all production traffic, where it monitors billions of TCP connections 24x7. For five years Fathom has been our primary tool for troubleshooting service network issues and assessing network infrastructure changes. We present case studies to show how it has helped us improve our production services. View details
    Preview abstract As with most large-scale migration efforts, the last 20% of Alphabet's BeyondCorp migration required disproportionate effort. After successfully transitioning most of the company's workflows to BeyondCorp, we still had a long tail of specific, oddball, or challenging situations to resolve. This article examines how we created processes, tools, and solutions to handle use cases that were not easily adapted to our core HTTPS-based workflow. View details
    Physical Deployability Matters
    Proc. HotNets 2023: Twenty-Second ACM Workshop on Hot Topics in Networks
    Preview abstract While many network research papers address issues of deployability, with a few exceptions, this has been limited to protocol compatibility or switch-resource constraints, such as flow table sizes. We argue that good network designs must also consider the costs and complexities of deploying the design within the constraints of the physical environment in a datacenter: \emph{physical} deployability. The traditional metrics of network ``goodness'' mostly do not account for these costs and constraints, and this may partially explain why some otherwise attractive designs have not been deployed in real-world datacenters. View details