Jump to Content

Sparse Non-negative Matrix Language Modeling for Geo-annotated Query Session Data

Noam M. Shazeer
Automatic Speech Recognition and Understanding Workshop (ASRU 2015) Proceedings, IEEE, to appear (to appear)


The paper investigates the impact on query language modeling when using skip-grams within query as well as across queries in a given search session, in conjunction with the geo-annotation available for the query stream data. As modeling tool we use the recently proposed sparse non-negative matrix estimation technique, since it offers the same expressive power as the well-established maximum entropy approach in combining arbitrary context features. Experiments on the google.com query stream show that using session-level and geo-location context we can expect reductions in perplexity of 34% relative over the Kneser Ney N-gram baseline; when evaluating on the `''local'' subset of the query stream, the relative reduction in PPL is 51%---more than a bit. Both sources of context information (geo-location, and previous queries in session) are about equally valuable in building a language model for the query stream.