- Weize Kong
- Swaraj Khadanga
- Cheng Li
- Shaleen Gupta
- Mingyang Zhang
- Wensong Xu
- Mike Bendersky
Abstract
Prior work in Dense Retrieval usually encodes queries and documents using single-vector representations (also called embeddings) and performs retrieval in the embedding space using approximate nearest neighbor search. This paradigm enables efficient semantic retrieval. However, the single-vector representations can be ineffective at capturing different aspects of the queries and documents in relevance matching, especially for some vertical domains. For example, in e-commerce search, these aspects could be category, brand and color. Given a query "white nike socks", a Dense Retrieval model may mistakenly retrieve some "white adidas socks" while missing out the intended brand. We propose to explicitly represent multiple aspects using one embedding per aspect. We introduce an aspect prediction task to teach the model to capture aspect information with particular aspect embeddings. We design a lightweight network to fuse the aspect embeddings for representing queries and documents. Our evaluation using an e-commerce dataset shows impressive improvements over strong Dense Retrieval baselines. We also discover that the proposed aspect embeddings can enhance the interpretability of Dense Retrieval models as a byproduct.
Research Areas
Learn more about how we do research
We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work