Google Research

Reducing the Sampling Complexity of Topic Models

  • Aaron Li
  • Amr Ahmed
  • Sujith Ravi
  • Alexander J Smola
ACM Conference on Knowledge Discovery and Data Mining (KDD) (2014)


Inference in topic models typically involves a sampling step to associate latent variables with observations. Unfortunately, the generative model loses sparsity with the increase in data, requiring O(k) operations per word for k latent states, such as topics. In this paper we propose an algorithm which requires only O(kd) operations per word, where kd is the number of actually instantiated topics in the document. For large document collections and structured hierarchical models kd ≪ k, thus yielding an order of magnitude speedup. Our method is general and it applies to a wide variety of statistical models. At its core is the idea that dense, rapidly changing distributions can be approximated efficiently by the combination of a Metropolis-Hastings step, judicious use of sparsity, and amortized preprocessing via the alias method.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work