Nan Deng
Nan Deng works on Borg, a cluster management system used inside Google. His research focuses on resource scheduling and isolation in a datacenter. He received his PhD from the Ohio State University. His adviser was Dr. Christopher Stewart
Research Areas
Authored Publications
Sort By
Preview abstract
The ability to accurately estimate job runtime properties allows a scheduler to effectively schedule jobs. State-of-theart online cluster job schedulers use history-based learning,
which uses past job execution information to estimate the runtime properties of newly arrived jobs. However, with fast-paced development in cluster technology (in both hardware and software) and changing user inputs, job runtime properties can change over time, which lead to inaccurate
predictions.
In this paper, we explore the potential and limitation of real-time learning of job runtime properties, by proactively sampling and scheduling a small fraction of the tasks of each job. Such a task-sampling-based approach exploits the similarity among runtime properties of the tasks of the same job and is inherently immune to changing job behavior. Our analytical and experimental analysis of 3 production traces with different skew and job distribution shows that learning in space can be substantially more accurate. Our simulation and testbed evaluation on Azure of the two learning approaches anchored in a generic job scheduler using 3 production cluster job traces shows that despite its online overhead, learning in space reduces the average Job Completion Time (JCT) by 1.28×, 1.56×, and 1.32× compared to the prior-art historybased predictor. Finally, we show how sampling-based learning can be extended to schedule DAG jobs and achieve similar speedups over the prior-art history-based predictor.
View details
Take it to the Limit: Peak Prediction-driven Resource Overcommitment in Datacenters
Noman Bashir
David Irwin
Sree Kodakara
Rohit Jnagal
Eurosys 2021 (2021)
Preview abstract
To increase utilization, datacenter schedulers often overcommit resources where the sum of resources allocated to the tasks on a machine exceeds its physical capacity. Setting the right level of overcommitment is a challenging problem: low overcommitment leads to wasted resources, while high over-commitment leads to task performance degradation. In this paper, we take a first principles approach to designing and evaluating overcommit policies by asking a basic question:assuming complete knowledge of each task’s future resource usage, what is the safest overcommit policy that yields the highest utilization? We call this policy the peak oracle. We then devise practical overcommit policies that mimic this peak oracle by predicting future machine resource usage.We simulate our overcommit policies using the recently-released Google cluster trace, and show that they result in higher utilization and less overcommit errors than policies based on per-task allocations. We also deploy these policies to machines inside Google’s datacenters serving its internal production workload. We show that our overcommit policies increase these machines’ usable CPU capacity by 10-16% compared to no overcommitment.
View details
Borg: the Next Generation
Muhammad Tirmazi
Adam Barker
Md Ehtesam Haque
Zhijing Gene Qin
Mor Harchol-Balter
EuroSys'20, ACM, Heraklion, Crete (2020)
Preview abstract
This paper analyzes a newly-published trace that covers 8
different Borg clusters for the month of May 2019. The
trace enables researchers to explore how scheduling works in
large-scale production compute clusters. We highlight how
Borg has evolved and perform a longitudinal comparison of
the newly-published 2019 trace against the 2011 trace, which
has been highly cited within the research community.
Our findings show that Borg features such as alloc sets
are used for resource-heavy workloads; automatic vertical
scaling is effective; job-dependencies account for much of
the high failure rates reported by prior studies; the workload arrival rate has increased, as has the use of resource over-commitment; the workload mix has changed, jobs have
migrated from the free tier into the best-effort batch tier;
the workload exhibits an extremely heavy-tailed distribution
where the top 1% of jobs consume over 99% of resources; and
there is a great deal of variation between different clusters.
View details
Software-defined far memory in warehouse-scale computers
Andres Lagar-Cavilla
Suleiman Souhlal
Neha Agarwal
Radoslaw Burny
Shakeel Butt
Junaid Shahid
Greg Thelen
Kamil Adam Yurtsever
Yu Zhao
International Conference on Architectural Support for Programming Languages and Operating Systems (2019)
Preview abstract
Increasing memory demand and slowdown in technology scaling pose important challenges to total cost of ownership (TCO) of warehouse-scale computers (WSCs). One promising idea to reduce the memory TCO is to add a cheaper, but slower, "far memory" tier and use it to store infrequently accessed (or cold) data. However, introducing a far memory tier brings new challenges around dynamically responding to workload diversity and churn, minimizing stranding of capacity, and addressing brownfield (legacy) deployments.
We present a novel software-defined approach to far memory that proactively compresses cold memory pages to effectively create a far memory tier in software. Our end-to-end system design encompasses new methods to define performance service-level objectives (SLOs), a mechanism to identify cold memory pages while meeting the SLO, and our implementation in the OS kernel and node agent. Additionally, we design learning-based autotuning to periodically adapt our design to fleet-wide changes without a human in the loop. Our system has been successfully deployed across Google's WSC since 2016, serving thousands of production services. Our software-defined far memory is significantly cheaper (67% or higher memory cost reduction) at relatively good access speeds (6 us) and allows us to store a significant fraction of infrequently accessed data (on average, 20%), translating to significant TCO savings at warehouse scale.
View details