Networking

Networking is central to modern computing, from WANs connecting cell phones to massive data stores, to the data-center interconnects that deliver seamless storage and fine-grained distributed computing. Because our distributed computing infrastructure is a key differentiator for the company, Google has long focused on building network infrastructure to support our scale, availability, and performance needs, and to apply our expertise and infrastructure to solve similar problems for Cloud customers. Our research combines building and deploying novel networking systems at unprecedented scale, with recent work focusing on fundamental questions around data center architecture, cloud virtual networking, and wide-area network interconnects. We helped pioneer the use of Software Defined Networking, the application of ML to networking, and the development of large-scale management infrastructure including telemetry systems. We are also addressing congestion control and bandwidth management, capacity planning, and designing networks to meet traffic demands. We build cross-layer systems to ensure high network availability and reliability. By publishing our findings at premier research venues, we continue to engage both academic and industrial partners to further the state of the art in networked systems.

Recent Publications

On the Benefits of Traffic “Reprofiling” The Multiple Hops Case – Part I
Henry Sariowan
Jiaming Qiu
Jiayi Song
Roch Guerin
IEEE/ACM Transactions on Networking (2024)
Preview abstract Abstract—This paper considers networks where user traffic is regulated through deterministic traffic profiles, e.g. token buckets, and requirescleanguaranteed hard delay bounds. The network’s goal is to minimize the resources it needs to meet those cleanrequirementsbounds. The paper explores how reprofiling, i.e. proactively modifying how user traffic enters the network, can be of benefit. Reprofiling produces “smoother” flows but introduces an up-front access delay that forces tighter network delays. The paper explores this trade-off and demonstrates that, unlike what holds in the single-hop case, reprofiling can be of benefit even when “optimal”cleansophisticated schedulers are available at each hop. View details
KATch: A Fast Symbolic Verifier for NetKAT
Mark Moeller
Jules Jacobs
Olivier Savary Belanger
David Darais
Cole Schlesinger
Nate Foster
Alexandra Silva
Programming Languages and Implementation (PLDI) (2024) (to appear)
Preview abstract We develop new data structures and algorithms for checking verification queries in NetKAT, a domain-specific language for specifying the behavior of network data planes. Our results extend the techniques obtained in prior work on symbolic automata and provide a framework for building efficient and scalable verification tools. We present \KATch, an implementation of these ideas in Scala, including extended logical operators that are useful for expressing network-wide specifications and optimizations that construct a bisimulation quickly or generate a counter-example showing that none exists. We evaluate the performance of our implementation on real-world and synthetic benchmarks, verifying properties such as reachability and slice isolation, typically returning a result in well under a second, which is orders of magnitude faster than previous approaches. View details
On the Benefits of Traffic “Reprofiling” The Single Hop Case
Henry Sariowan
Jiaming Qiu
Jiayi Song
Roch Guerin
IEEE/ACM Transactions on Networking (2024)
Preview abstract Datacenters have become a significant source of traffic, much of which is carried over private networks. The operators of those networks commonly have access to detailed traffic profiles and performance goals, which they seek to meet as efficiently as possible. Of interest are solutions that guarantee latency while minimizing network bandwidth. The paper explores a basic building block towards realizing such solutions, namely, a single hop configuration. The main results are in the form of optimal solutions for meeting local deadlines under schedulers of varying complexity and therefore cost. The results demonstrate how judiciously modifying flows’ traffic profiles, i.e., reprofiling them, can help simple schedulers reduce the bandwidth they require, often performing nearly as well as more complex ones. View details
Preview abstract This is an invited OFC 2024 conference workshop talk regarding a new type of lower-power datacenter optics design choice: linear pluggable optics. In this talk I will discuss the fundamental performance constraints facing linear pluggable optics and their implications on DCN and ML use cases View details
Physical Deployability Matters
Proc. HotNets 2023: Twenty-Second ACM Workshop on Hot Topics in Networks
Preview abstract While many network research papers address issues of deployability, with a few exceptions, this has been limited to protocol compatibility or switch-resource constraints, such as flow table sizes. We argue that good network designs must also consider the costs and complexities of deploying the design within the constraints of the physical environment in a datacenter: \emph{physical} deployability. The traditional metrics of network ``goodness'' mostly do not account for these costs and constraints, and this may partially explain why some otherwise attractive designs have not been deployed in real-world datacenters. View details
Selfish Routing and Link Scheduling in mmWave Backhaul Networks
Dionysia Triantafyllopoulou
Klaus Moessner
IEEE ICC (2023)
Preview abstract In this paper we present and evaluate the performance of a routing and link scheduling algorithm for millimeter wave (mmWave) backhaul networks. The proposed algorithm models the end user behavior as being selfish, i.e., it considers users always aiming to maximize their individual utility, rather than the global optimization objective. Our system utilizes popular concepts from the economics and fairness literature. Specifically, in order to forward packets between the access points that comprise the backhaul network the Shapley value method is applied, which is shown to induce solutions with reduced latency. The performance of the proposed algorithm is evaluated in terms of the total delay, as well as the price of anarchy, which represents the inefficiency of a scheduling policy when users are allowed to adapt their rates in a selfish manner and reach an equilibrium. A relaxed version of the problem is also presented, which provides a lower bound on the value of the optimal solution. This is used for the calculation of the price of anarchy, since the problem of finding the optimal solution is NP-hard. According to simulation results, the system that employs the proposed algorithm outperforms in terms of delay and price of anarchy a system that considers a First-In-First-Out (FIFO) packet forwarding policy, as well as a system that employs local search global optimization, under which users aim at optimizing the overall delay in the network. View details
×