Xiao Zhang

Xiao Zhang

Xiao Zhang received his PhD in Computer Science from University of Rochester in 2010. Before that, he earned BS in Computer Science from University of Science and Technology of China. His research focuses on Operating Systems and Computer Architecture and now he primarily works on systems for accelerating domain specific computing. In addition to that, he also works on power management systems.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    GASS: GPU Automated Sharing at Scale
    Dragos Sbirlea
    Jiafan Zhu
    Konstantinos Menychtas
    Yuang Liu
    Zhijing Gene Qin
    The IEEE International Conference on Cloud Computing (CLOUD) 2024 (2024)
    Preview abstract General-purpose GPUs, with their powerful numerical computing capacity, are popular platforms for accelerating machine-learning workloads. However, our experience with a large scale production deployment shows that typical GPU work-loads often fail to keep the GPU pipeline fully occupied, resulting in low overall resource utilization. To address this inefficiency, we have designed and implemented GPU Automated Sharing at Scale (GASS). GASS relies on fine-grained time-multiplexing to let GPU compute resources be shared among different tasks, and on-demand paging to let GPU memory be shared among them. GASS mitigates sharing performance anomalies by using real-time performance monitoring to drive adaptive rescheduling. Our cluster level evaluation shows the aggregated GPU throughput is increased by 50% under GASS and that sharing enables the cluster to support 19% more GPU jobs. View details
    Thunderbolt: Throughput-Optimized, Quality-of-Service-Aware Power Capping at Scale
    Shaohong Li
    Sreekumar Kodakara
    14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), {USENIX} Association (2020), pp. 1241-1255
    Preview abstract As the demand for data center capacity continues to grow, hyperscale providers have used power oversubscription to increase efficiency and reduce costs. Power oversubscription requires power capping systems to smooth out the spikes that risk overloading power equipment by throttling workloads. Modern compute clusters run latency-sensitive serving and throughput-oriented batch workloads on the same servers, provisioning resources to ensure low latency for the former while using the latter to achieve high server utilization. When power capping occurs, it is desirable to maintain low latency for serving tasks and throttle the throughput of batch tasks. To achieve this, we seek a system that can gracefully throttle batch workloads and has task-level quality-of-service (QoS) differentiation. In this paper we present Thunderbolt, a hardware-agnostic power capping system that ensures safe power oversubscription while minimizing impact on both long-running throughput-oriented tasks and latency-sensitive tasks. It uses a two-threshold, randomized unthrottling/multiplicative decrease control policy to ensure power safety with minimized performance degradation. It leverages the Linux kernel's CPU bandwidth control feature to achieve task-level QoS-aware throttling. It is robust even in the face of power telemetry unavailability. Evaluation results at the node and cluster levels demonstrate the system's responsiveness, effectiveness for reducing power, capability of QoS differentiation, and minimal impact on latency and task health. We have deployed this system at scale, in multiple production clusters. As a result, we enabled power oversubscription gains of 9%--25%, where none was previously possible. View details
    Evaluation of NUMA-Aware Scheduling in Warehouse-Scale Clusters
    Richard Wu
    Xiangling Kong
    Yangyi Chen
    Robert Hagmann
    Rohit Jnagal
    IEEE CLOUD 2019
    Preview abstract Non-uniform memory access (NUMA) has been extensively studied at the machine level but few studies have examined NUMA optimizations at the cluster level. This paper introduces a holistic NUMA-aware scheduling policy that combines both machine-level and cluster-level NUMA-aware optimizations. We evaluate our holistic NUMA-aware scheduling policy on Google’s production cluster trace with a cluster scheduling simulator that measures the impact of NUMAaware scheduling under two scheduling algorithms, Best Fit and Enhanced PVM (E-PVM). While our results highlight that a holistic NUMA-aware scheduling policy substantially increases the proportion of NUMA-fit tasks by 22.0% and 25.6% for both the Best Fit and E-PVM scheduling algorithms, respectively, there is a non-trivial tradeoff between cluster job packing efficiency and NUMA-fitness for the E-PVM algorithm under certain circumstances. View details
    HaPPy: Hyperthread-aware Power Profiling Dynamically
    Yan Zhai
    Stephane Eranian
    Lingjia Tang
    Jason Mars
    USENIX Annual Technical Conference 2014
    Preview
    {CPI^2}: {CPU} performance isolation for shared compute clusters
    Robert Hagmann
    Rohit Jnagal
    Vrigo Gokhale
    SIGOPS European Conference on Computer Systems (EuroSys), ACM, Prague, Czech Republic (2013), pp. 379-391
    Preview abstract Performance isolation is a key challenge in cloud computing. Unfortunately, Linux has few defenses against performance interference in shared resources such as processor caches and memory buses, so applications in a cloud can experience unpredictable performance caused by other program's behavior. Our solution, CPI2, uses cycles-per-instruction (CPI) data obtained by hardware performance counters to identify problems, select the likely perpetrators, and then optionally throttle them so that the victims can return to their expected behavior. It automatically learns normal and anomalous behaviors by aggregating data from multiple tasks in the same job. We have rolled out CPI2 to all of Google's shared compute clusters. The paper presents the analysis that lead us to that outcome, including both case studies and a large-scale evaluation of its ability to solve real production issues. View details
    Optimizing Google's Warehouse Scale Computers: The NUMA Experience
    Lingjia Tang
    Jason Mars
    Robert Hagmann
    The 19th IEEE International Symposium on High Performance Computer Architecture (2013)
    Preview