Jump to Content
Xi Wang

Xi Wang

Xi works on Linux kernel and system performance. His interests include cpu scheduling, software - microarchitecture interactions, power efficiency, networking, scalability, low latency and read-time.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Thunderbolt: Throughput-Optimized, Quality-of-Service-Aware Power Capping at Scale
    Shaohong Li
    Xiao Zhang
    Vasileios Kontorinis
    Sreekumar Kodakara
    Parthasarathy Ranganathan
    14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), {USENIX} Association (2020), pp. 1241-1255
    Preview abstract As the demand for data center capacity continues to grow, hyperscale providers have used power oversubscription to increase efficiency and reduce costs. Power oversubscription requires power capping systems to smooth out the spikes that risk overloading power equipment by throttling workloads. Modern compute clusters run latency-sensitive serving and throughput-oriented batch workloads on the same servers, provisioning resources to ensure low latency for the former while using the latter to achieve high server utilization. When power capping occurs, it is desirable to maintain low latency for serving tasks and throttle the throughput of batch tasks. To achieve this, we seek a system that can gracefully throttle batch workloads and has task-level quality-of-service (QoS) differentiation. In this paper we present Thunderbolt, a hardware-agnostic power capping system that ensures safe power oversubscription while minimizing impact on both long-running throughput-oriented tasks and latency-sensitive tasks. It uses a two-threshold, randomized unthrottling/multiplicative decrease control policy to ensure power safety with minimized performance degradation. It leverages the Linux kernel's CPU bandwidth control feature to achieve task-level QoS-aware throttling. It is robust even in the face of power telemetry unavailability. Evaluation results at the node and cluster levels demonstrate the system's responsiveness, effectiveness for reducing power, capability of QoS differentiation, and minimal impact on latency and task health. We have deployed this system at scale, in multiple production clusters. As a result, we enabled power oversubscription gains of 9%--25%, where none was previously possible. View details
    Snap: a Microkernel Approach to Host Networking
    Jacob Adriaens
    Sean Bauer
    Carlo Contavalli
    Mike Dalton
    William C. Evans
    Nicholas Kidd
    Roman Kononov
    Carl Mauer
    Emily Musick
    Lena Olson
    Mike Ryan
    Erik Rubow
    Kevin Springborn
    Valas Valancius
    Amin Vahdat
    In ACM SIGOPS 27th Symposium on Operating Systems Principles, ACM, New York, NY, USA (2019) (to appear)
    Preview abstract This paper presents our design and experience with a microkernel-inspired approach to host networking called Snap. Snap is a userspace networking system that supports Google’s rapidly evolving needs with flexible modules that implement a range of network functions, including edge packet switching, virtualization for our cloud platform, traffic shaping policy enforcement, and a high-performance reliable messaging and RDMA-like service. Snap has been running in production for over three years, supporting the extensible communication needs of several large and critical systems. Snap enables fast development and deployment of new networking features, leveraging the benefits of address space isolation and the productivity of userspace software development together with support for transparently upgrading networking services without migrating applications off of a machine. At the same time, Snap achieves compelling performance through a modular architecture that promotes principled synchronization with minimal state sharing, and supports real-time scheduling with dynamic scaling of CPU resources through a novel kernel/userspace CPU scheduler co-design. Our evaluation demonstrates over 3x Gbps/core improvement compared to a kernel networking stack for RPC workloads, software-based RDMA-like performance of up to 5M IOPS/core, and transparent upgrades that are largely imperceptible to user applications. Snap is deployed to over half of our fleet of machines and supports the needs of numerous teams. View details
    No Results Found