David Wetherall

David Wetherall

David Wetherall is a Distinguished Engineer working on low-level networking issues in Google's datacenter networks, including congestion control, load balancing, reliability, offload, and performance for HPC and ML workloads. Before joining Google in 2014 he was a Professor at the University of Washington, and Director of Intel's Seattle Research Lab. He is a Fellow of the ACM and IEEE, and author of the textbook "Computer Networks".

Research Areas

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract We describe our experience with Fathom, a system for identifying the network performance bottlenecks of any service running in the Google fleet. Fathom passively samples RPCs, the principal unit of work for services. It segments the overall latency into host and network components with kernel and RPC stack instrumentation. It records these detailed latency metrics, along with detailed transport connection state, for every sampled RPC. This lets us determine if the completion is constrained by the client, network or server. To scale while enabling analysis, we also aggregate samples into distributions that retain multi-dimensional breakdowns. This provides us with a macroscopic view of individual services. Fathom runs globally in our datacenters for all production traffic, where it monitors billions of TCP connections 24x7. For five years Fathom has been our primary tool for troubleshooting service network issues and assessing network infrastructure changes. We present case studies to show how it has helped us improve our production services. View details
    Improving Network Availability with Protective ReRoute
    Abdul Kabbani
    Van Jacobson
    Jim Winget
    Brad Morrey
    Uma Parthavi Moravapalle
    Steven Knight
    SIGCOMM 2023
    Preview abstract We present PRR (Protective ReRoute), a transport technique for shortening user-visible outages that complements routing repair. It can be added to any transport to provide benefits in multipath networks. PRR responds to flow connectivity failure signals, e.g., retransmission timeouts, by changing the FlowLabel on packets of the flow, which causes switches and hosts to choose a different network path that may avoid the outage. To enable it, we shifted our IPv6 network architecture to use the FlowLabel, so that hosts can change the paths of their flows without application involvement. PRR is deployed fleetwide at Google for TCP and Pony Express, where it has been protecting all production traffic for several years. It is also available to our Cloud customers. We find it highly effective for real outages. In a measurement study on our network backbones, adding PRR reduced the cumulative region-pair outage time for RPC traffic by 63--84%. This is the equivalent of adding 0.4--0.8 "nines'" of availability. View details
    PLB: Congestion Signals are Simple and Effective for Network Load Balancing
    Abdul Kabbani
    Junhua Yan
    Kira Yin
    Masoud Moshref
    Mubashir Adnan Qureshi
    Qiaobin Fu
    Van Jacobson
    SIGCOMM (2022)
    Preview abstract We describe our experience with PLB, a host-based load balancing design for modern networks. PLB randomly changes the paths of connections that experience congestion, preferring idle periods to minimize transport interactions. It does so by changing the IPv6 FlowLabel on the packets of a connection, which switches include as part of the ECMP flow hash. Across many hosts, this action drives down the number of hotspots in the network, while separating short RPCs from elephant flows to keep their completion time low. We use PLB fleetwide in Google networks for TCP and PonyExpress (RDMA-like) traffic. We find it to be simple, general, and effective. It was easy to deploy, co-existing with other traffic, requiring only small transport modifications and little of switches, and needing no application changes. And it has produced large gains across the board, for multiple transports and from datacenter through backbone networks. After deploying PLB, the median utilization imbalance of busy switches in Google datacenter networks fell by 60\% and packet drops correspondingly fell by 33\%. At hosts, the tail latency (99$^{th}$ percentile) of short RPCs fell by up to 25\%. View details
    Preview abstract We report on experiences deploying Swift congestion control in Google datacenters. Swift relies on hardware timestamps in modern NICs, and is based on AIMD control with a specified end-to-end delay target. This simple design is an evolution of earlier protocols used at Google. It has emerged as a foundation for excellent performance, when network distances are well-known, that helps to meet operational challenges. Delay is easy to decompose into fabric and host components to separate concerns, and effortless to deploy and maintain as a signal from switches in changing datacenter environments. With Swift, we obtain low flow completion times for short RPCs, even at the 99th-percentile, while providing high throughput for long RPCs. At datacenter scale, Swift achieves 50$\mu$s tail latencies for short RPCs while sustaining a 100Gbps throughput per-server, a load close to 100\%. This is much better than protocols such as DCTCP that degrade latency and loss at high utilization. View details
    TIMELY: RTT-based Congestion Control for the Datacenter
    Radhika Mittal
    Terry Lam
    Emily Blem
    Monia Ghobadi
    Amin Vahdat
    David Zats
    Sigcomm '15, Google Inc (2015)
    Preview abstract Datacenter transports aim to deliver low latency messaging together with high throughput. We show that simple packet delay, measured as round-trip times at hosts, is an effective congestion signal without the need for switch feedback. First, we show that advances in NIC hardware have made RTT measurement possible with microsecond accuracy, and that these RTTs are sufficient to estimate switch queueing. Then we describe how TIMELY can adjust transmission rates using RTT gradients to keep packet latency low while delivering high bandwidth. We implement our design in host software running over NICs with OS-bypass capabilities. We show using experiments with up to hundreds of machines on a Clos network topology that it provides excellent performance: turning on TIMELY for OS-bypass messaging over a fabric with PFC lowers 99 percentile tail latency by 9X while maintaining near line-rate throughput. Our system also outperforms DCTCP running in an optimized kernel, reducing tail latency by 13X. To the best of our knowledge, TIMELY is the first delay-based congestion control protocol for use in the datacenter, and it achieves its results despite having an order of magnitude fewer RTT signals (due to NIC offload) than earlier delay-based schemes such as Vegas. View details