Identifying Well-formed Natural Language Questions
Abstract
Understanding natural language queries is fundamental to many practical NLP systems.
Often, such systems comprise of a brittle processing pipeline, that is not robust to "word salad" text ubiquitously issued by users.
However, if a query resembles a grammatical and well-formed question, such a pipeline is able to perform more accurate interpretation, thus reducing downstream compounding errors.
Hence, identifying whether or not a query is well formed can enhance query understanding.
Here, we introduce a new task of identifying a well-formed natural language question.
We construct and release a dataset of 25,100 publicly available questions classified into well-formed and non-well-formed categories and report an accuracy
of 70.7% on the test set.
We also show that our classifier can be used to improve
the performance of neural sequence-to-sequence model for generating questions for reading comprehension.
Often, such systems comprise of a brittle processing pipeline, that is not robust to "word salad" text ubiquitously issued by users.
However, if a query resembles a grammatical and well-formed question, such a pipeline is able to perform more accurate interpretation, thus reducing downstream compounding errors.
Hence, identifying whether or not a query is well formed can enhance query understanding.
Here, we introduce a new task of identifying a well-formed natural language question.
We construct and release a dataset of 25,100 publicly available questions classified into well-formed and non-well-formed categories and report an accuracy
of 70.7% on the test set.
We also show that our classifier can be used to improve
the performance of neural sequence-to-sequence model for generating questions for reading comprehension.