Jump to Content

A System for Massively Parallel Hyperparameter Tuning

Liam Li
Kevin Jamieson
Ekaterina Gonina
Jonathan Ben-tzur
Moritz Hardt
Benjamin Recht
Ameet Talwalkar
Third Conference on Systems and Machine Learning (2020) (to appear)
Google Scholar

Abstract

Modern learning models are characterized by large hyperparameter spaces and long training times; this coupled with the rise of parallel computing and productionization of machine learning motivate developing production- quality hyperparameter optimization functionality for a distributed computing setting. We address this challenge with a simple and robust hyperparameter optimization algorithm ASHA, which exploits parallelism and aggressive early-stopping to tackle large-scale hyperparameter optimization problems. Our extensive empirical results show that ASHA outperforms state-of-the-art hyperparameter optimization methods; scales linearly with the number of workers in distributed settings; and is suitable for massive parallelism, converging to a high quality configuration in half the time taken by Vizier (Google’s internal hyperparameter optimization service) in an experiment with 500 workers. We end with a discussion of the systems considerations we encountered and our associated solutions when implementing ASHA in SystemX, a production-quality service for hyperparameter tuning.

Research Areas