Expectation Maximization and Posterior Constraints
Abstract
The expectation maximization (EM) algorithm is a widely used maximum likelihood estimation procedure for statistical models when the values of some of the
variables in the model are not observed. Very often, however, our aim is primarily to find a model that assigns values to the latent variables that have intended
meaning for our data and maximizing expected likelihood only sometimes accomplishes this. Unfortunately, it is typically difficult to add even simple a-priori
information about latent variables in graphical models without making the models
overly complex or intractable. In this paper, we present an efficient, principled
way to inject rich constraints on the posteriors of latent variables into the EM
algorithm. Our method can be used to learn tractable graphical models that satisfy additional, otherwise intractable constraints. Focusing on clustering and the
alignment problem for statistical machine translation, we show that simple, intuitive posterior constraints can greatly improve the performance over standard
baselines and be competitive with more complex, intractable models.
variables in the model are not observed. Very often, however, our aim is primarily to find a model that assigns values to the latent variables that have intended
meaning for our data and maximizing expected likelihood only sometimes accomplishes this. Unfortunately, it is typically difficult to add even simple a-priori
information about latent variables in graphical models without making the models
overly complex or intractable. In this paper, we present an efficient, principled
way to inject rich constraints on the posteriors of latent variables into the EM
algorithm. Our method can be used to learn tractable graphical models that satisfy additional, otherwise intractable constraints. Focusing on clustering and the
alignment problem for statistical machine translation, we show that simple, intuitive posterior constraints can greatly improve the performance over standard
baselines and be competitive with more complex, intractable models.