Eric Tune
Eric Tune received his PhD in Computer Science from University of California at San Diego in 2004. His research focuses on Computer Architecture and Datacenter Computing.
Authored Publications
Sort By
Large-scale cluster management at {Google} with {Borg}
Luis Pedrosa
Madhukar R. Korupolu
David Oppenheimer
Proceedings of the European Conference on Computer Systems (EuroSys), ACM, Bordeaux, France (2015)
Preview abstract
Google's Borg system is a cluster manager that runs hundreds of thousands of jobs, from many thousands of different applications, across a number of clusters each with up to tens of thousands of machines.
It achieves high utilization by combining admission control, efficient task-packing, over-commitment, and machine sharing with process-level performance isolation. It supports high-availability applications with runtime features that minimize fault-recovery time, and scheduling policies that reduce the probability of correlated failures. Borg simplifies life for its users by offering a declarative job specification language, name service integration, real-time job monitoring, and tools to analyze and simulate system behavior.
We present a summary of the Borg system architecture and features, important design decisions, a quantitative analysis of some of its policy decisions, and a qualitative examination of lessons learned from a decade of operational experience with it.
View details
{CPI^2}: {CPU} performance isolation for shared compute clusters
Robert Hagmann
Rohit Jnagal
Vrigo Gokhale
SIGOPS European Conference on Computer Systems (EuroSys), ACM, Prague, Czech Republic (2013), pp. 379-391
Preview abstract
Performance isolation is a key challenge in cloud computing. Unfortunately, Linux has few defenses against performance interference in shared resources such as processor caches and memory buses, so applications in a cloud can experience unpredictable performance caused by other program's behavior.
Our solution, CPI2, uses cycles-per-instruction (CPI) data obtained by hardware performance counters to identify problems, select the likely perpetrators, and then optionally throttle them so that the victims can return to their expected behavior. It automatically learns normal and anomalous behaviors by aggregating data from multiple tasks in the same job.
We have rolled out CPI2 to all of Google's shared compute clusters. The paper presents the analysis that lead us to that outcome, including both case studies and a large-scale evaluation of its ability to solve real production issues.
View details
Optimizing Google's Warehouse Scale Computers: The NUMA Experience
Preview
Lingjia Tang
Jason Mars
Robert Hagmann
The 19th IEEE International Symposium on High Performance Computer Architecture (2013)
Preview abstract
Google-Wide Profiling (GWP), a continuous profiling infrastructure for data centers, provides performance insights for cloud applications. With negligible overhead, GWP provides stable, accurate profiles and a datacenter-scale tool for traditional performance analyses. Furthermore, GWP introduces novel applications of its profiles, such as application- platform affinity measurements and identification of platform-specific, microarchitectural peculiarities.
View details