Scaling Distributed Machine Learning with the Parameter Server
Abstract
We propose a parameter server framework for distributed
machine learning problems. Both data and workloads
are distributed over worker nodes, while the server nodes
maintain globally shared parameters, represented as dense
or sparse vectors and matrices. The framework manages
asynchronous data communication between nodes, and
supports flexible consistency models, elastic scalability,
and continuous fault tolerance.
To demonstrate the scalability of the proposed framework,
we show experimental results on petabytes of real
data with billions of examples and parameters on problems
ranging from Sparse Logistic Regression to Latent
Dirichlet Allocation and Distributed Sketching.
machine learning problems. Both data and workloads
are distributed over worker nodes, while the server nodes
maintain globally shared parameters, represented as dense
or sparse vectors and matrices. The framework manages
asynchronous data communication between nodes, and
supports flexible consistency models, elastic scalability,
and continuous fault tolerance.
To demonstrate the scalability of the proposed framework,
we show experimental results on petabytes of real
data with billions of examples and parameters on problems
ranging from Sparse Logistic Regression to Latent
Dirichlet Allocation and Distributed Sketching.