Google Research

Crafting a lexicon of referential expressions for NLG applications

The 2017 Israeli Seminar of Computational Linguistics, Rachel and Selim Benin School of Computer Science and Engineering, Edmond J. Safra Campus, Jerusalem (2017)


To be perfectly conversational, an agent needs to produce grammatically correct and eloquent sentences. To reach this goal, we use templatic systems with linguistically-aware specifications to generate idiomatic utterances, coupled with annotated lexical entities. The morphosyntactic features of the lexical entities are crucial to render grammatical and natural sounding sentences.

Existing electronic resources, like dictionaries or thesauri, lack wide-scale information about referential expressions (i.e. proper names). In this work, we focus on the creation of a large-scale lexicon of such referential expressions, relying on n-gram models, morpho-syntactic parsing, and non-linguistic knowledge. We describe the linguistic information we collect and the techniques we use to automatically extract this from large text corpora in a way that scales across languages and over millions of entities.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work