Ramki Gummadi
I am with the Brain team at Google Research, where I work on both basic research and applied engagements. My interests include reinforcement learning, optimization, information theory and systems.
Research Areas
Authored Publications
Sort By
HALP: Heuristic Aided Learned Preference Eviction Policy for YouTube Content Delivery Network
Zhenyu Song
Kevin Chen
Νikhil Sarda
Eugene Brevdo
Jimmy Coleman
Xiao Ju
Pawel Jurczyk
Richard Schooler
20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23), USENIX Association, Boston, MA (2023), pp. 1149-1163
Preview abstract
Video streaming services are among the largest web applications in production, and a large source of downstream internet traffic. A large-scale video streaming service at Google, YouTube, leverages a Content Delivery Network (CDN) to serve its users. A key consideration in providing a seamless service is cache efficiency. In this work, we demonstrate machine learning techniques to improve the efficiency of YouTube's CDN DRAM cache. While many recently proposed learning-based caching algorithms show promising results, we identify and address three challenges blocking deployment of such techniques in a large-scale production environment: computation overhead for learning, robust byte miss ratio improvement, and measuring impact under production noise. We propose a novel caching algorithm, HALP, which achieves low CPU overhead and robust byte miss ratio improvement by augmenting a heuristic policy with machine learning. We also propose a production measurement method, impact distribution analysis, that can accurately measure the impact distribution of a new caching algorithm deployment in a noisy production environment.
HALP has been running in YouTube CDN production as a DRAM level eviction algorithm since early 2022 and has reliably reduced the byte miss during peak by an average of 9.1% while expending a modest CPU overhead of 1.8%.
View details
Preview abstract
Approaches to policy optimization have been motivated from diverse principles, based on how the parametric model is interpreted or how the learning objective is formulated, yet they share a common goal of maximizing expected return. To better capture the commonalities and identify the key differences between alternative policy optimization methods, we develop a unified perspective that re-expresses the underlying update rules in terms of a limited choice of gradient form and a scaling function. In particular, we identify a unified space of approximate gradient updates for policy optimization that is highly structured, yet covers both classical and recent examples, including PPO. The primary benefit is that the framework also reveals novel but still well motivated updates that generalize existing algorithms in a way that can deliver benefits both in terms of convergence speed and final result quality. An experimental investigation demonstrates that the additional degrees of freedom identified in the unification can be leveraged to obtain non-trivial improvements both in synthetic domains and on popular deep RL benchmarks.
View details