Statistical Machine Translation for Query Expansion in Answer Retrieval
Abstract
This paper presents a novel approach to query expansion in answer retrieval that uses Statistical Machine Translation (SMT) techniques to bridge the lexical gap between questions and answers. SMT-based query expansion is performed on the one hand by using a SMT-based full-sentence paraphraser to introduce synonyms in the context the full query, and on the other hand by training an SMT model on question-answer pairs and expanding queries by answer terms taken from translations of full queries. We compare these global, context-aware query expansion techniques with a baseline tfidf model and local query expansion on a database of 10 million question-answer pairs extracted from FAQ pages. Experimental results show a significant improvement of SMT-based query expansion over both baselines.