Lexical generalization in CCG grammar induction for semantic parsing
Abstract
We consider the problem of learning factored probabilistic CCG grammars for semantic
parsing from data containing sentences paired with logical-form meaning representations.
Traditional CCG lexicons list lexical items that pair words and phrases with syntactic
and semantic content. Such lexicons can be inefficient when words appear repeatedly
with closely related lexical content. In this paper, we introduce factored lexicons, which
include both lexemes to model word meaning and templates to model systematic variation in
word usage. We also present an algorithm for learning factored CCG lexicons, along with a
probabilistic parse-selection model. Evaluations on benchmark datasets demonstrate that
the approach learns highly accurate parsers, whose generalization performance benefits
greatly from the lexical factoring.