Yaogong Wang
Research Areas
Authored Publications
Sort By
Swift: Delay is Simple and Effective for Congestion Control in the Datacenter
Keon Jang
Kevin Springborn
Mike Ryan
SIGCOMM 2020 (2020)
Preview abstract
We report on experiences deploying Swift congestion control in Google datacenters. Swift relies on hardware timestamps in modern NICs, and is based on AIMD control with a specified end-to-end delay target. This simple design is an evolution of earlier protocols used at Google. It has emerged as a foundation for excellent performance, when network distances are well-known, that helps to meet operational challenges. Delay is easy to decompose into fabric and host components to separate concerns, and effortless to deploy and maintain as a signal from switches in changing datacenter environments. With Swift, we obtain low flow completion times for short RPCs, even at the 99th-percentile, while providing high throughput for long RPCs. At datacenter scale, Swift achieves 50$\mu$s tail latencies for short RPCs while sustaining a 100Gbps throughput per-server, a load close to 100\%. This is much better than protocols such as DCTCP that degrade latency and loss at high utilization.
View details
PicNIC: Predictable Virtualized NIC
Nathan Lewis
Yi Cui
Valas Valancius
Jake Adriaens
Nate Foster
ACM SIGCOMM 2019
Preview abstract
Network virtualization stacks such as Andromeda and Virtual Filtering Platform are the linchpins of public clouds hosting Virtual Machines (VMs). The dataplane is based on a combination of high performance OS bypass software and hardware packet processing paths. A key goal is to provide network performance isolation such that workloads of one VM do not adversely impact the network experience of another VM. In this work, we characterize how isolation breakages occur in virtualization stacks and motivate predictable VM performance just as if they were operating on dedicated hardware. We formulate an abstraction of a Predictable Virtualized NIC for bandwidth, latency and packet loss. We propose three constructs to achieve predictability: egress traffic shaping, and a combination of congestion control and CPU-fair weighted fair queueing for ingress isolation. Using these constructs in coherence, we provide the illusion of a dedicated NIC to VMs, all while maintaining the raw performance of the fastpath dataplane.
View details
TIMELY: RTT-based Congestion Control for the Datacenter
Radhika Mittal
Terry Lam
Emily Blem
Monia Ghobadi
Amin Vahdat
David Zats
Sigcomm '15, Google Inc (2015)
Preview abstract
Datacenter transports aim to deliver low latency messaging together with high throughput. We show that simple packet delay, measured as round-trip times at hosts, is an effective congestion signal without the need for switch feedback. First, we show that advances in NIC hardware have made RTT measurement possible with microsecond accuracy, and that these RTTs are sufficient to estimate switch queueing. Then we describe how TIMELY can adjust transmission rates using RTT gradients to keep packet latency low while delivering high bandwidth. We implement our design in host software running over NICs with OS-bypass capabilities. We show using experiments with up to hundreds of machines on a Clos network topology that it provides excellent performance: turning on TIMELY for OS-bypass messaging over a fabric with PFC lowers 99 percentile tail latency by 9X while maintaining near line-rate throughput. Our system also outperforms DCTCP running in an optimized kernel, reducing tail latency by 13X. To the best of our knowledge, TIMELY is the first delay-based congestion control protocol for use in the datacenter, and it achieves its results despite having an order of magnitude fewer RTT signals (due to NIC offload) than earlier delay-based schemes such as Vegas.
View details