Google Research

Sparse Non-negative Matrix Language Modeling for Geo-annotated Query Session Data

Automatic Speech Recognition and Understanding Workshop (ASRU 2015) Proceedings, IEEE, to appear (to appear)


The paper investigates the impact on query language modeling when using skip-grams within query as well as across queries in a given search session, in conjunction with the geo-annotation available for the query stream data. As modeling tool we use the recently proposed sparse non-negative matrix estimation technique, since it offers the same expressive power as the well-established maximum entropy approach in combining arbitrary context features.

Experiments on the query stream show that using session-level and geo-location context we can expect reductions in perplexity of 34% relative over the Kneser Ney N-gram baseline; when evaluating on the `''local'' subset of the query stream, the relative reduction in PPL is 51%---more than a bit. Both sources of context information (geo-location, and previous queries in session) are about equally valuable in building a language model for the query stream.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work