Publications
Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.
Publications
Sort By
1 - 15 of 318 publications
On the Benefits of Traffic “Reprofiling” The Single Hop Case
Henry Sariowan
Jiaming Qiu
Jiayi Song
Roch Guerin
IEEE/ACM Transactions on Networking (2024)
Preview abstract
Datacenters have become a significant source of traffic, much of which is carried over private networks. The operators of those networks commonly have access to detailed traffic profiles and performance goals, which they seek to meet as efficiently as possible. Of interest are solutions that guarantee latency while minimizing network bandwidth. The paper explores a basic building block towards realizing such solutions, namely, a single hop configuration. The main results are in the form of optimal solutions for meeting local deadlines under schedulers of varying complexity and therefore cost. The results demonstrate how judiciously modifying flows’ traffic profiles, i.e., reprofiling them, can help simple schedulers reduce the bandwidth they require, often performing nearly as well as more complex ones.
View details
Preview abstract
This is an invited OFC 2024 conference workshop talk regarding a new type of lower-power datacenter optics design choice: linear pluggable optics. In this talk I will discuss the fundamental performance constraints facing linear pluggable optics and their implications on DCN and ML use cases
View details
(Invited) How Traffic Analytics Shapes Traffic Engineering, Topology Engineering, and Capacity Planning of Jupiter
Jianan Zhang
Optical Fiber Communication (OFC) Conference, IEEE (2023)
Preview abstract
Three prominent traffic features including peak alignment, stable ranking, and gravity model, have guided the design of current Google Jupiter fabrics in traffic engineering, topology engineering, and capacity planning.
View details
Change Management in Physical Network Lifecycle Automation
Virginia Beauregard
Kevin Grant
Angus Griffith
Jahangir Hasan
Chen Huang
Quan Leng
Jiayao Li
Alexander Lin
Zhoutao Liu
Ahmed Mansy
Bill Martinusen
Nikil Mehta
Andrew Narver
Anshul Nigham
Melanie Obenberger
Sean Smith
Kurt Steinkraus
Sheng Sun
Edward Thiele
Proc. 2023 USENIX Annual Technical Conference (USENIX ATC 23)
Preview abstract
Automated management of a physical network's lifecycle is critical for large networks. At Google, we manage network design, construction, evolution, and management via multiple automated systems. In our experience, one of the primary challenges is to reliably and efficiently manage change in this domain -- additions of new hardware and connectivity, planning and sequencing of topology mutations, introduction of new architectures, new software systems and fixes to old ones, etc.
We especially have learned the importance of supporting multiple kinds of change in parallel without conflicts or mistakes (which cause outages) while also maintaining parallelism between different teams and between different processes. We now know that this requires automated support.
This paper describes some of our network lifecycle goals, the automation we have developed to meet those goals, and the change-management challenges we encountered. We then discuss in detail our approaches to several specific kinds of change
management:
(1) managing conflicts between multiple operations on the same network;
(2) managing conflicts between operations spanning the boundaries between networks;
(3) managing representational changes in the models that drive our automated systems.
These approaches combine both novel software systems and software-engineering practices.
While this paper reports on our experience with large-scale datacenter network infrastructures, we are also applying the same tools and practices in several adjacent domains, such as the management of WAN systems, of machines, and of datacenter physical designs. Our approaches are likely to be useful at smaller scales, too.
View details
Preview abstract
We introduce logical synchrony, a framework that allows distributed computing to be coordinated as tightly as with pure synchrony without the distribution of a global clock or any reference to a universal time. We describe and prove the main properties of the framework and point to how processes can be executed on a logically synchronous system.
View details
CAPA: An Architecture For Operating Cluster Networks With High Availability
Bingzhe Liu
Mukarram Tariq
Omid Alipourfard
Rich Alimi
Deepak Arulkannan
Virginia Beauregard
Patrick Conner
Brighten Godfrey
Xander Lin
Mayur Patel
Joon Ong
Amr Sabaa
Alex Smirnov
Manish Verma
Prerepa Viswanadham
Google, Google, 1600 Amphitheatre Pkwy, Mountain View, CA 94043 (2023)
Preview abstract
Management operations are a major source of outages for networks. A number of best practices designed to reduce and mitigate such outages are well known, but their enforcement has been challenging, leaving the network vulnerable to inadvertent mistakes and gaps which repeatedly result in outages. We present our experiences with CAPA, Google’s “containment and prevention architecture” for regulating management operations on our cluster networking fleet. Our goal with CAPA is to limit the systems where strict adherence to best practices is required, so that availability of the network is not dependent on the good intentions of every engineer and operator. We enumerate the features of CAPA which we have found to be necessary to effectively enforce best practices within a thin “regulation“ layer. We evaluate CAPA based on case studies of outages prevented, counterfactual analysis of past incidents, and known limitations. Management-plane-related outages have substantially reduced both in frequency and severity, with a 82% reduction in cumulative duration of incidents normalized to fleet size over five years
View details
Preview abstract
Bolt is a congestion-control algorithm designed to providesingle-digit microsecond tail network-queuing at near-linerate utilization. Motivated by the need for ultra-low latencyto support applications such as NVMe, as line rates reach200G and beyond, most transfers fit within a single BDP en-tailing that transfer times predominantly become a functionof queuing and propagation delays. Bolt is an attempt topush congestion-control to its theoretical limits by harness-ing the power of programmable dataplanes such as Tofinoand Trident3+ chips. Bolt is founded on three key ideas, (i)Sub-RTT reaction (SRR): reacting to congestion faster thanRTT control-loop delay, (ii) Proactive Ramp-up (PRU): bytracking future flow-completions, and (iii) Supply matching(SM): leveraging Network Calculus concepts to maximizeutilization. Our current results achieve a 75% reduction inqueuing-delays over Swift with upto 3x improvement incompletion times for short transfers.
View details
Preview abstract
We review state-of-the-art datacenter technologies for 800G, 1.6T and beyond interconnect speeds, focusing on 200G per-lane IM-DD (intensity modulated-direct detect) and 800G-LR1 coherent-lite transmissions.
View details
Improving Network Availability with Protective ReRoute
Abdul Kabbani
Van Jacobson
Jim Winget
Brad Morrey
Uma Parthavi Moravapalle
Steven Knight
SIGCOMM 2023
Preview abstract
We present PRR (Protective ReRoute), a transport technique for shortening user-visible outages that complements routing repair. It can be added to any transport to provide benefits in multipath networks. PRR responds to flow connectivity failure signals, e.g., retransmission timeouts, by changing the FlowLabel on packets of the flow, which causes switches and hosts to choose a different network path that may avoid the outage. To enable it, we shifted our IPv6 network architecture to use the FlowLabel, so that hosts can change the paths of their flows without application involvement. PRR is deployed fleetwide at Google for TCP and Pony Express, where it has been protecting all production traffic for several years. It is also available to our Cloud customers. We find it highly effective for real outages. In a measurement study on our network backbones, adding PRR reduced the cumulative region-pair outage time for RPC traffic by 63--84%. This is the equivalent of adding 0.4--0.8 "nines'" of availability.
View details
Poseidon: An Efficient Congestion Control using Deployable INT for Data Center Networks
Weitao Wang
Masoud Moshref
T. S. Eugene Ng
NSDI (2023)
Preview abstract
The difficulty in gaining visibility into the fine-time scale hop-level congestion state of networks has been a key challenge faced by congestion control protocols for decades. How-ever, the emergence of commodity switches supporting in-network telemetry (INT) enables more advanced congestion control. In this paper, we presentPoseidon, a novel congestion control protocol that exploits INT to address blind spots of end-to-end algorithms and realize several fundamentally advantageous properties. Specifically, Poseidon realizes congestion control for the actual bottleneck hop. In the steady state,Poseidon realizes network-wide max-min fair bandwidth al-location. Furthermore, Poseidon decouples the bandwidth fairness requirement from the traditional AIMD control law, making it possible for Poseidon to converge fast and smooth out bandwidth oscillations. Equally important, Poseidon is de-signed to be amenable to incremental brownfield deployment in networks that mix INT and non-INT switches. Our testbed and simulation experiments show that compared to a widely-deployed state-of-the-art non-INT protocol, Swift, Poseidon improves op latency up to 10x in some percentiles (61% in average), lowers fabric RTT by more than 50%, reduces congestion window ramp up time by 40% while decreasing the throughput variation for flows with small windows by 94%.Finally, it is robust to reverse-path and multi-hop congestion.
View details
Preview abstract
In this paper we present and evaluate the performance of a routing and link scheduling algorithm for millimeter wave (mmWave) backhaul networks. The proposed algorithm models the end user behavior as being selfish, i.e., it considers users always aiming to maximize their individual utility, rather than the global optimization objective. Our system utilizes popular concepts from the economics and fairness literature. Specifically, in order to forward packets between the access points that comprise the backhaul network the Shapley value method is applied, which is shown to induce solutions with reduced latency. The performance of the proposed algorithm is evaluated in terms of the total delay, as well as the price of anarchy, which represents the inefficiency of a scheduling policy when users are allowed to adapt their rates in a selfish manner and reach an equilibrium. A relaxed version of the problem is also presented, which provides a lower bound on the value of the optimal solution. This is used for the calculation of the price of anarchy, since the problem of finding the optimal solution is NP-hard. According to simulation results, the system that employs the proposed algorithm outperforms in terms of delay and price of anarchy a system that considers a First-In-First-Out (FIFO) packet forwarding policy, as well as a system that employs local search global optimization, under which users aim at optimizing the overall delay in the network.
View details
Preview abstract
This document reserves a Top-Level Domain (TLD) label "alt" to be used in non-DNS contexts. It also provides advice and guidance to developers creating alternative namespaces.
View details
Fathom: Understanding Datacenter Application Network Performance
Junhua Yan
Mubashir Adnan Qureshi
Van Jacobson
Yousuk Seung
Proceedings of ACM SIGCOMM 2023
Preview abstract
We describe our experience with Fathom, a system for identifying the network performance bottlenecks of any service running in the Google fleet. Fathom passively samples RPCs, the principal unit of work for services. It segments the overall latency into host and network components with kernel and RPC stack instrumentation. It records these detailed latency metrics, along with detailed transport connection state, for every sampled RPC. This lets us determine if the completion is constrained by the client, network or server. To scale while enabling analysis, we also aggregate samples into distributions that retain multi-dimensional breakdowns. This provides us with a macroscopic view of individual services. Fathom runs globally in our datacenters for all production traffic, where it monitors billions of TCP connections 24x7. For five years Fathom has been our primary tool for troubleshooting service network issues and assessing network infrastructure changes. We present case studies to show how it has helped us improve our production services.
View details
Preview abstract
As with most large-scale migration efforts, the last 20% of Alphabet's BeyondCorp migration required disproportionate effort. After successfully transitioning most of the company's workflows to BeyondCorp, we still had a long tail of specific, oddball, or challenging situations to resolve. This article examines how we created processes, tools, and solutions to handle use cases that were not easily adapted to our core HTTPS-based workflow.
View details
Preview abstract
While many network research papers address issues of deployability, with
a few exceptions, this has been limited to protocol compatibility or
switch-resource constraints, such as flow table sizes.
We argue that good network designs must also consider the costs and
complexities of deploying the design within the constraints of the physical
environment in a datacenter: \emph{physical} deployability.
The traditional metrics of network ``goodness'' mostly do not account
for these costs and constraints, and this may partially explain why some
otherwise attractive designs have not been deployed in real-world datacenters.
View details