Summarizing News Articles using Question-and-Answer Pairs via Learning

Cong Yu


The launch of the new Google News in 2018 introduced the Frequently asked questions feature to structurally summarize the news story in its full coverage page. While news summarization has been a research topic for decades, this new feature is poised to usher in a new line of news summarization techniques. There are two fundamental approaches: mining the questions from data associated with the news story and learning the questions from the content of the story directly. This paper provides the first study, to the best of our knowledge, of a learning based approach to generate a structured summary of news articles with question and answer pairs to capture salient and interesting aspects of the news story. Specifically, this learning-based approach reads a news article, predicts its attention map (i.e., important snippets in the article), and generates multiple natural language questions corresponding to each snippet. Furthermore, we describe a mining-based approach as the mechanism to generate weak supervision data for training the learning based approach. We evaluate our approach on the existing SQuAD dataset2 and a large dataset with 91K news articles we constructed. We show that our proposed system can achieve an AUC of 0:734 for document attention map prediction, a BLEU-4 score of 12:46 for natural question generation and a BLEU-4 score of 24:4 for question summarization, beating state-of-art baselines.