Timeline: A Dynamic Hierarchical Dirichlet Process Model for Recovering Birth/Death and Evolution of Topics in Text Stream
Abstract
Topic models have proven to be a useful tool
for discovering latent structures in document
collections. However, most document collections often come as temporal streams and
thus several aspects of the latent structure
such as the number of topics, the topics' distribution and popularity are time-evolving.
Several models exist that model the evolution of some but not all of the above aspects. In this paper we introduce innite
dynamic topic models, iDTM, that can accommodate the evolution of all the aforementioned aspects. Our model assumes that documents are organized into epochs, where the
documents within each epoch are exchangeable but the order between the documents
is maintained across epochs. iDTM allows
for unbounded number of topics: topics can
die or be born at any epoch, and the representation of each topic can evolve according
to a Markovian dynamics. We use iDTM to
analyze the birth and evolution of topics in
the NIPS community and evaluated the efficacy of our model on both simulated
for discovering latent structures in document
collections. However, most document collections often come as temporal streams and
thus several aspects of the latent structure
such as the number of topics, the topics' distribution and popularity are time-evolving.
Several models exist that model the evolution of some but not all of the above aspects. In this paper we introduce innite
dynamic topic models, iDTM, that can accommodate the evolution of all the aforementioned aspects. Our model assumes that documents are organized into epochs, where the
documents within each epoch are exchangeable but the order between the documents
is maintained across epochs. iDTM allows
for unbounded number of topics: topics can
die or be born at any epoch, and the representation of each topic can evolve according
to a Markovian dynamics. We use iDTM to
analyze the birth and evolution of topics in
the NIPS community and evaluated the efficacy of our model on both simulated