Ashwin Chaugule
Ashwin works on the technical infrastructure that runs the production fleet for Search, Youtube, Cloud and more. His interests span the full gamut of challenges from performance optimization of complex systems to enabling novel technologies at scale. He received his Masters in Computer Science from the Pennsylvania State University.
Research Areas
Authored Publications
Sort By
ghOSt: Fast and Flexible User-Space Delegation of Linux Scheduling
Jack Tigar Humphries
Neel Natu
Ofir Weisse
Barret Rhoden
Josh Don
Oleg Rombakh
Christos Kozyrakis
Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles CD-ROM, Association for Computing Machinery, New York, NY, USA (2021), 588–604
Preview abstract
We present ghOSt, our infrastructure for delegating kernel scheduling decisions to userspace code. ghOSt is designed to support the rapidly evolving needs of our data center workloads and platforms.
Improving scheduling decisions can drastically improve the throughput, tail latency, scalability, and security of important workloads. However, kernel schedulers are difficult to implement, test, and deploy efficiently across a large fleet. Recent research suggests bespoke scheduling policies, within custom data plane operating systems, can provide compelling performance results in a data center setting. However, these gains have proved difficult to realize as it is impractical to deploy a custom OS image(s) at an application granularity, particularly in a multi-tenant environment, limiting the practical applications of these new techniques.
ghOSt provides general-purpose delegation of scheduling policies to userspace processes in a Linux environment. ghOSt provides state encapsulation, communication, and action mechanisms that allow complex expression of scheduling policies within a userspace agent, while assisting in synchronization. Programmers use any language to develop and optimize policies, which are modified without a host reboot. ghOSt supports a wide range of scheduling models, from per-CPU to centralized, run-to-completion to preemptive, and incurs low overheads for scheduling actions. We demonstrate ghOSt's performance on both academic and real-world workloads, including Google Snap and Google Search. We show that by using ghOSt instead of the kernel scheduler, we can quickly achieve comparable throughput and latency while enabling policy optimization, non-disruptive upgrades, and fault isolation for our data center workloads. We open-source our implementation to enable future research and development based on ghOSt.
View details
Software-defined far memory in warehouse-scale computers
Andres Lagar-Cavilla
Suleiman Souhlal
Neha Agarwal
Radoslaw Burny
Shakeel Butt
Junaid Shahid
Greg Thelen
Kamil Adam Yurtsever
Yu Zhao
International Conference on Architectural Support for Programming Languages and Operating Systems (2019)
Preview abstract
Increasing memory demand and slowdown in technology scaling pose important challenges to total cost of ownership (TCO) of warehouse-scale computers (WSCs). One promising idea to reduce the memory TCO is to add a cheaper, but slower, "far memory" tier and use it to store infrequently accessed (or cold) data. However, introducing a far memory tier brings new challenges around dynamically responding to workload diversity and churn, minimizing stranding of capacity, and addressing brownfield (legacy) deployments.
We present a novel software-defined approach to far memory that proactively compresses cold memory pages to effectively create a far memory tier in software. Our end-to-end system design encompasses new methods to define performance service-level objectives (SLOs), a mechanism to identify cold memory pages while meeting the SLO, and our implementation in the OS kernel and node agent. Additionally, we design learning-based autotuning to periodically adapt our design to fleet-wide changes without a human in the loop. Our system has been successfully deployed across Google's WSC since 2016, serving thousands of production services. Our software-defined far memory is significantly cheaper (67% or higher memory cost reduction) at relatively good access speeds (6 us) and allows us to store a significant fraction of infrequently accessed data (on average, 20%), translating to significant TCO savings at warehouse scale.
View details