Google Research

Distributed Gibbs sampling for latent variable models

  • Arthur Asuncion
  • Padhraic Smyth
  • Max Welling
  • David Newman
  • Ian Porteous
  • Scott Triglia
Scaling up Machine Learning, Cambridge (2012) (to appear)

Abstract

This book presents an integrated collection of representative approaches for scaling up machine learning and data mining methods on parallel and distributed computing platforms. Demand for parallelizing learning algorithms is highly task-specific: in some settings it is driven by the enormous dataset sizes, in others by model complexity or by real-time performance requirements. Making task-appropriate algorithm and platform choices for large-scale machine learning requires understanding the benefits, trade-offs and constraints of the available options. Solutions presented in the book cover a range of parallelization platforms from FPGAs and GPUs to multi-core systems and commodity clusters, concurrent programming frameworks including CUDA, MPI, MapReduce and DryadLINQ, and learning settings (supervised, unsupervised, semi-supervised and online learning). Extensive coverage of parallelization of boosted trees, SVMs, spectral clustering, belief propagation and other popular learning algorithms and deep dives into several applications make the book equally useful for researchers, students and practitioners

Research Areas

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work