Bhargav Kanagal
Research Areas
Authored Publications
Sort By
ShopTalk: A System for Conversational Faceted Search
D. Sivakumar
Ebenezer Omotola Anjorin
Gurmeet Singh Manku
Ilya Eckstein
James Patrick Lee-Thorp
Jim Rosswog
Jingchen Feng
Joshua Ainslie
Larry Adams
Michael Anthony Pohl
Sudeep Gandhe
Zach Pearson
SIGIR eCom '22 (2022)
Preview abstract
We present ShopTalk, a multi-turn conversational faceted search system for Shopping that is designed to handle large and complex schemas that are beyond the scope of state of the art slot-filling systems. ShopTalk decouples dialog management from fulfillment, thereby allowing the dialog understanding system to be domain-agnostic and not tied to the particular Shopping application. The dialog understanding system consists of a deep-learned Contextual Language Understanding module, which interprets user utterances, and a primarily rules-based Dialog-State Tracker (DST), which updates the dialog state and formulates search requests intended for the fulfillment engine. The interface between the two modules consists of a minimal set of domain-agnostic ``intent operators,'' which instruct the DST on how to update the dialog state. ShopTalk was deployed in 2020 on the Google Assistant for Shopping searches.
View details
MAVE: A Product Dataset for Multi-source Attribute Value Extraction
Qifan Wang
Anand Kulkarni
Bin Shu
Jon Elsas
WSDM 2022 (2022)
Preview abstract
Attribute value extraction refers to the task of identifying values of an attribute of interest from product information. Product attribute values are essential in many e-commerce scenarios, such as customer service robots, product ranking, retrieval and recommendations. While in the real world, the attribute values of a product are usually incomplete and vary over time, which greatly hinders the practical applications.
In this paper, we introduce MAVE, a new dataset to better facilitate research on product attribute value extraction. MAVE is composed of a curated set of 2.2 million products from Amazon pages, with 3 million attribute-value annotations across 1257 unique categories. MAVE has four main and unique advantages: First, MAVE is the largest product attribute value extraction dataset by the number of attribute-value examples by 8x. Second, MAVE includes multi-source representations from the product, which captures the full product information with high attribute coverage. Third, MAVE represents a more diverse set of attributes and values relative to what previous datasets cover.
Lastly, MAVE provides a very challenging zero-shot test set, as we empirically illustrate in the experiments. We further propose a novel approach that effectively extracts the attribute value from the multi-source product information. We conduct extensive experiments with several baselines and show that MAVE is challenging for the latest attribute value extraction models, especially on zero-shot attribute extraction.
View details
Preview abstract
Transformer is the backbone of modern NLP models. In this paper, we propose RealFormer, a simple and generic technique to create Residual Attention Layer Transformer networks that significantly outperform canonical Transformer and its variations of different sizes on a wide spectrum of tasks/benchmarks including Masked Language Modeling, GLUE, SQuAD, Neural Machine Translation, WikiHop, HotpotQA, Natural Questions, and OpenKP. Qualitatively, RealFormer stabilizes training and leads to models with sparser attentions. Code and pre-trained checkpoints will be open-sourced.
View details
DOCENT: Learning Self-Supervised Entity Representations from Large Document Collections
Yury Zemlyanskiy
Sudeep Gandhe
Ruining He
Anirudh Ravula
Juro Gottweis
Ilya Eckstein
Proceedings of EACL (2021) (to appear)
Preview abstract
This paper explores learning rich self-supervised entity representations from large amounts of associated text. Once pre-trained, these models become applicable to multiple entity-centric tasks such as search ranked retrieval, knowledge base completion, question answering and more. Unlike other methods that harvest self-supervision signals based merely on a local context within a sentence, we radically expand the notion of context to include {\em any} available text related to an entity. With the breadth and depth of textual content available on the web, this approach enables a new class of powerful, high-capacity representations that can ultimately ``remember" any useful information about an entity, without the need for human annotations.
We present several training strategies that jointly learn to predict words and entities --- strategies we compare experimentally on downstream tasks in the TV-Movies domain, such as MovieLens tag prediction from user reviews and natural language movie search. As evidenced by results, our models outperform competitive baselines, sometimes with little or no fine-tuning, and are also able to scale to very large corpora.
Finally, we make our datasets and pre-trained models publicly available\footnote{To be released after the review period.}. This includes {\em Reviews2Movielens}, mapping the 1B word corpus of Amazon movie reviews to MovieLens tags, as well as Reddit Movie Suggestions containing natural language queries and corresponding community recommendations.
View details
Learning to Extract Attribute Value from Product via Question Answering: A Multi-task Approach
Qifan Wang
D. Sivakumar
Bin Shu
Jon Elsas
SIGKDD 2020 (2020)
Preview abstract
Attribute value extraction refers to the task of identifying values of an attribute of interest from product information.
It is an important research topic which has been widely studied in e-Commerce and relation learning.
There are two main limitations in existing attribute value extraction methods: scalability and generalizability.
Most existing methods treat each attribute independently and build separate models for each of them, which are not suitable for large scale attribute systems in real-world applications.
Moreover, very limited research has focused on generalizing extraction to new attributes.
In this work, we propose a novel approach for Attribute Value Extraction via Question Answering (AVEQA) using a multi-task framework.
In particular, we build a question answering model which treats each attribute as a question and identifies the answer span corresponding to the attribute value in the product context.
A unique BERT contextual encoder is adopted and shared across all attributes to encode both the context and the question, which makes the model scalable.
A distilled masked language model with knowledge distillation loss is introduced to improve the model generalization ability. In addition, we employ a no-answer classifier to explicitly handle the cases where there are no values for a given attribute in the product context.
The question answering, distilled masked language model and the no answer classification are then combined into a unified multi-task framework.
We conduct extensive experiments on a public dataset. The results demonstrate that the proposed approach outperforms several state-of-the-art methods with large margin.
View details
Constructing a Comprehensive Events Database from the Web
Preview
Qifan Wang
Vijay Garg
D. Sivakumar
The 28th ACM International Conference on Information and Knowledge Management (CIKM 2019) (2019)
Recommendations for all : solving thousands of recommendation problems a day
Proceedings of the 34th IEEE International Conference on Data Engineering (ICDE) (2018) (to appear)
Preview abstract
Recommendations are known to be an important part of several online experiences. Outside of media recommendation (music, movies, etc), online retailers have made use of product recommendations to help users make purchases. Product recommendation tends to be really hard because of the twin problems of sparsity and cold-start. Building a recommendation system that performs well in this setting is hard and is generally considered to need some expert tuning. However, all online retailers need to solve this problem well to provide good recommendations.
In this paper, we tackle this problem and describe an industrial-scale system called Sigmund where we solve tens of thousands of instances of the recommendation problem as a service for various online retailers. for customers. Sigmund was deployed to production in early 2014 and has been serving thousands of retailers. We describe several design decisions that we made in building Sigmund. We also share some of the lessons we learned from this experience –both from a machine learning perspective and a systems perspective. We hope that these lessons are useful for building future machine-learning services.
View details
A Generic Coordinate Descent Framework for Learning from Implicit Feedback
Immanuel Bayer
Xiangnan He
Proceedings of the 26th International Conference on World Wide Web (2017), pp. 1341-1350
Preview abstract
In recent years, interest in recommender research has shifted from explicit feedback towards implicit feedback data. A diversity of complex models has been proposed for a wide variety of applications. Despite this, learning from implicit feedback is still computationally challenging. So far, most work relies on stochastic gradient descent (SGD) solvers which are easy to derive, but in practice challenging to apply, especially for tasks with many items. For the simple matrix factorization model, an efficient coordinate descent (CD) solver has been previously proposed. However, efficient CD approaches have not been derived for more complex models.
In this paper, we provide a new framework for deriving efficient CD algorithms for complex recommender models. We identify and introduce the property of k-separable models. We show that k-separability is a sufficient property to allow efficient optimization of implicit recommender problems with CD. We illustrate this framework on a variety of state-of-the-art models including factorization machines and Tucker decomposition. To summarize, our work provides the theory and building blocks to derive efficient implicit CD algorithms for complex recommender models.
View details
Latent Factor Models with Additive Hierarchically-smoothed User Preferences
Sandeep Pandey
Vanja Josifovski
Lluis Garcia-Pueyo
Proceedings of The 6th ACM International Conference on Web Search and Data Mining (WSDM) (2013)
Preview abstract
Items in recommender systems are usually associated with annotated attributes such as brand and price for products; agency for news articles, etc. These attributes are highly informative and must be exploited for accurate recommendation. While learning a user preference model over these attributes can result in an interpretable recommender system and can hands the cold start problem, it suffers from two major drawbacks: data sparsity and the inability to model random effects. On the other hand, latent-factor collaborative filtering models have shown great promise in recommender systems; however, its performance on rare items is poor. In this paper we propose a novel model LFUM, which provides the advantages of both of the above models. We learn user preferences (over the attributes) using a personalized Bayesian hierarchical model that uses a combination (additive model) of a globally learned preference model along with user-specific preferences. To combat we smooth these preferences over the item-taxonomy an efficient forward-filtering and backward-smoothing algorithm. Our inference algorithms can handle both attributes (e.g., item brands) and continuous attributes (e.g., prices). We combine the user preferences with the latent- models and train the resulting collaborative filtering system end- using the successful BPR ranking algorithm. In our experimental analysis, we show that our proposed model several commonly used baselines and we carry out an ablation study showing the benefits of each component of our model.
View details
Focused Marix Factorization for Audience Selection in Display Advertising
Sandeep Pandey
Vanja Josifovski
Lluis Garcia-Pueyo
Jeff Yuan
Proceedings of the 29th International Conference on Data Engineering (ICDE) (2013)
Preview abstract
Audience selection is a key problem in display advertising systems in which we need to select a list of users who are interested (i.e., most likely to buy) in an advertising campaign. The users’ past feedback on this campaign can be leveraged to construct such a list using collaborative filtering techniques such as matrix factorization. However, the user-campaign interaction is typically extremely sparse, hence the conventional matrix factorization does not perform well. Moreover, simply combining the users feedback from all campaigns does not address this since it dilutes the focus on target campaign in consideration. To resolve these issues, we propose a novel focused matrix factorization model (FMF) which learns users’ preferences towards the specific campaign products, while also exploiting the information about related products. We exploit the product taxonomy to discover related campaigns, and design models to discriminate between the users’ interest towards campaign products and non-campaign products. We develop a parallel multi-core implementation of the FMF model and evaluate its performance over a real-world advertising dataset spanning more than a million products. Our experiments demonstrate the benefits of using our models over existing approaches.
View details