Scaling Distributed Machine Learning with the Parameter Server

Mu Li

David G. Anderson

Jun Woo Park

Alexander J. Smola

Amr Ahmed

Vanja Josifovski

James Long

Eugene J. Shekita

Bor-Yiing Su

Operating Systems Design and Implementation (OSDI), USENIX (2014), pp. 583-598

Google Scholar

Abstract

We propose a parameter server framework for distributed machine learning problems. Both data and workloads are distributed over worker nodes, while the server nodes maintain globally shared parameters, represented as dense or sparse vectors and matrices. The framework manages asynchronous data communication between nodes, and supports flexible consistency models, elastic scalability, and continuous fault tolerance. To demonstrate the scalability of the proposed framework, we show experimental results on petabytes of real data with billions of examples and parameters on problems ranging from Sparse Logistic Regression to Latent Dirichlet Allocation and Distributed Sketching.

Research Areas

Machine Intelligence

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Scaling Distributed Machine Learning with the Parameter Server

Abstract

Research Areas

Meet the teams driving innovation

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Scaling Distributed Machine Learning with the Parameter Server

Abstract

Research Areas

Meet the teams driving innovation

AI/ML Foundations  & Capabilities