ConceptVector: Text Visual Analytics via Interactive Lexicon Building using Word Embedding

Deok Gun Park
Jurim Lee
Jaegul Choo
Nicholas Diakopoulos
Niklas Elmqvist
IEEE Transactions on Visualization and Computer Graphics (TVCG) (2017)
Central to many text analysis methods is the notion of a concept: a set of semantically related keywords characterizing a specific object, phenomenon, or theme. Advances in word embedding allow building such concepts from a small set of seed terms. However, naive application of such techniques may result in false positive errors because of the polysemy of human language. To mitigate this problem, we present a visual analytics system called ConceptVector that guides the user in building such concepts and then using them to analyze documents. Document-analysis case studies with real-world datasets demonstrate the fine-grained analysis provided by ConceptVector. To support the elaborate modeling of concepts using user seed terms, we introduce a bipolar concept model and support for irrelevant words. We validate the interactive lexicon building interface via a user study and expert reviews. The quantitative evaluation shows that the bipolar lexicon generated with our methods is comparable to human-generated ones.