Topic Taxonomy Expansion via Hierarchy-Aware Topic Phrase Generation

Dongha Lee
Jiaming Shen
Seonghyeon Lee
Susik Yoon
Hwanjo Yu
Jiawei Han
Proceedings of The Findings of the 2022 Conference on Empirical Methods in Natural Language Processing

Abstract

Topic taxonomies that display hierarchical topic structures of a text corpus have substantially contributed to various knowledge rich applications, including web search and question answering. Recently, for effective expansion of topic knowledge, there have been several attempts to expand (or complete) a topic taxonomy by inserting new topic nodes found in a given corpus. However, output taxonomies of existing expansion methods have shown limited quality for covering a wide variety of topic terms and representing consistent topic relations. This is because their capability of discovering novel topics relies on recursive inference of first-order topic relations (i.e. topic-subtopic) based on term embeddings. To tackle this challenge, we propose TopicExpan to directly generate topic-related terms (i.e., phrases) from relevant documents, while considering the relation structure surrounding a target topic in the hierarchy. That is, TopicExpan trains a topic-conditional term generator that captures the interaction among a topic, a document, and a topic-related term. Then, it utilizes the trained generator along with a virtual topic node newly-inserted at each valid position in the hierarchy, in order to collect the terms that should belong to the new topic. Experimental results demonstrate that TopicExpan significantly outperforms other baseline methods in terms of novel topic discovery, which results in better coverage of multi-word terms and higher consistency of topic relations.