Learning to Rank Answers to Non-Factoid Questions from Web Collections
Abstract
This work investigates the use of linguistically motivated features to improve search, in particular for ranking answers to non-factoid questions. We show that it is possible to exploit
existing large collections of question–answer pairs (from online social Question Answering sites)
to extract such features and train ranking models which combine them effectively. We investigate
a wide range of feature types, some exploiting natural language processing such as coarse word
sense disambiguation, named-entity identification, syntactic parsing, and semantic role labeling. Our experiments demonstrate that linguistic features, in combination, yield considerable
improvements in accuracy. Depending on the system settings we measure relative improvements
of 14% to 21% in Mean Reciprocal Rank and Precision@1, providing one of the most compelling
evidence to date that complex linguistic features such as word senses and semantic roles can have
a significant impact on large-scale information retrieval tasks.
existing large collections of question–answer pairs (from online social Question Answering sites)
to extract such features and train ranking models which combine them effectively. We investigate
a wide range of feature types, some exploiting natural language processing such as coarse word
sense disambiguation, named-entity identification, syntactic parsing, and semantic role labeling. Our experiments demonstrate that linguistic features, in combination, yield considerable
improvements in accuracy. Depending on the system settings we measure relative improvements
of 14% to 21% in Mean Reciprocal Rank and Precision@1, providing one of the most compelling
evidence to date that complex linguistic features such as word senses and semantic roles can have
a significant impact on large-scale information retrieval tasks.