Online Inference for the Infinite Cluster-topic Model: Storylines from Streaming Text
Abstract
We present the time-dependent topic-cluster
model, a hierarchical approach for combining
Latent Dirichlet Allocation and clustering via the
Recurrent Chinese Restaurant Process. It inherits
the advantages of both of its constituents, namely
interpretability and concise representation. We
show how it can be applied to streaming collections of objects such as real world feeds in a news
portal. We provide details of a parallel Sequential Monte Carlo algorithm to perform inference
in the resulting graphical model which scales to
hundred of thousands of documents.