Improving topic clustering on search queries with word co-occurrence and bipartite graph co-clustering
Abstract
Uncovering common themes from a large number of unorganized search queries is a primary step to mine insights about aggregated user interests. Common topic modeling techniques for document modeling often face sparsity problems with search query data as these are much shorter than documents. We present two novel techniques that can discover semantically meaningful topics in search queries: i) word co-occurrence clustering generates topics from words frequently occurring together; ii) weighted bigraph clustering uses URLs from Google search results to induce query similarity and generate topics. We exemplify our proposed methods on a set of Lipton brand as well as make-up & cosmetics queries. A comparison to standard LDA clustering demonstrates the usefulness and improved performance of the two proposed methods.