Sumit Kumar Sanghai
Authored Publications
Sort By
Preview abstract
Decoding methods for large language models often trade-off between diversity of outputs and parallelism of computation.
Methods such as beam search and Gumbel top-k sampling can guarantee a different output for each element of the beam, but are not easy to parallelize.
Alternatively, methods such as temperature sampling and its modifications (top-k sampling, nucleus sampling, typical decoding, and others), are embarrassingly parallel, but have no guarantees about duplicate samples.
We present a framework for sampling according to an arithmetic code book implicitly defined by a large language model, compatible with common sampling variations, with provable beam diversity under certain conditions, as well as being embarrassingly parallel and providing unbiased and consistent expectations from the original model.
We demonstrate the effectiveness of our approach on WMT machine translation, more than halving the standard deviation when estimating expected BLEU score reward, and closing the BLEU score gap between independent sampling and beam search by up to 63%.
View details
ShopTalk: A System for Conversational Faceted Search
D. Sivakumar
Ebenezer Omotola Anjorin
Gurmeet Singh Manku
Ilya Eckstein
James Patrick Lee-Thorp
Jim Rosswog
Jingchen Feng
Joshua Ainslie
Larry Adams
Michael Anthony Pohl
Sudeep Gandhe
Zach Pearson
SIGIR eCom '22 (2022)
Preview abstract
We present ShopTalk, a multi-turn conversational faceted search system for Shopping that is designed to handle large and complex schemas that are beyond the scope of state of the art slot-filling systems. ShopTalk decouples dialog management from fulfillment, thereby allowing the dialog understanding system to be domain-agnostic and not tied to the particular Shopping application. The dialog understanding system consists of a deep-learned Contextual Language Understanding module, which interprets user utterances, and a primarily rules-based Dialog-State Tracker (DST), which updates the dialog state and formulates search requests intended for the fulfillment engine. The interface between the two modules consists of a minimal set of domain-agnostic ``intent operators,'' which instruct the DST on how to update the dialog state. ShopTalk was deployed in 2020 on the Google Assistant for Shopping searches.
View details
MAVE: A Product Dataset for Multi-source Attribute Value Extraction
Li Yang
Qifan Wang
Zac Yu
Anand Kulkarni
Bin Shu
Jon Elsas
WSDM 2022 (2022)
Preview abstract
Attribute value extraction refers to the task of identifying values of an attribute of interest from product information. Product attribute values are essential in many e-commerce scenarios, such as customer service robots, product ranking, retrieval and recommendations. While in the real world, the attribute values of a product are usually incomplete and vary over time, which greatly hinders the practical applications.
In this paper, we introduce MAVE, a new dataset to better facilitate research on product attribute value extraction. MAVE is composed of a curated set of 2.2 million products from Amazon pages, with 3 million attribute-value annotations across 1257 unique categories. MAVE has four main and unique advantages: First, MAVE is the largest product attribute value extraction dataset by the number of attribute-value examples by 8x. Second, MAVE includes multi-source representations from the product, which captures the full product information with high attribute coverage. Third, MAVE represents a more diverse set of attributes and values relative to what previous datasets cover.
Lastly, MAVE provides a very challenging zero-shot test set, as we empirically illustrate in the experiments. We further propose a novel approach that effectively extracts the attribute value from the multi-source product information. We conduct extensive experiments with several baselines and show that MAVE is challenging for the latest attribute value extraction models, especially on zero-shot attribute extraction.
View details
Generate-and-Retrieve: use your predictions to improve retrieval for semantic parsing
Ice Pasupat
Joshua Ainslie
Linlu Qiu
Michiel de Jong
Yury Zemlyanskiy
Proceedings of COLING (2022)
Preview abstract
A common recent approach to semantic parsing augments sequence-to-sequence models by retrieving and appending a set of training samples, called exemplars. The effectiveness of this recipe is limited by the ability to retrieve informative exemplars that help produce the correct parse, which is especially challenging in low-resource settings. Existing retrieval is commonly based on similarity of query and exemplar inputs. We propose GandR, a retrieval procedure that retrieves exemplars for which outputs are also similar. GandR first generates a preliminary prediction with input-based retrieval. Then, it retrieves exemplars with outputs similar to the preliminary prediction which are used to generate a final prediction. GandR sets the state of the art on multiple low-resource semantic parsing tasks.
View details
Learning to Extract Attribute Value from Product via Question Answering: A Multi-task Approach
Qifan Wang
Li Yang
D. Sivakumar
Bin Shu
Zac Yu
Jon Elsas
SIGKDD 2020 (2020)
Preview abstract
Attribute value extraction refers to the task of identifying values of an attribute of interest from product information.
It is an important research topic which has been widely studied in e-Commerce and relation learning.
There are two main limitations in existing attribute value extraction methods: scalability and generalizability.
Most existing methods treat each attribute independently and build separate models for each of them, which are not suitable for large scale attribute systems in real-world applications.
Moreover, very limited research has focused on generalizing extraction to new attributes.
In this work, we propose a novel approach for Attribute Value Extraction via Question Answering (AVEQA) using a multi-task framework.
In particular, we build a question answering model which treats each attribute as a question and identifies the answer span corresponding to the attribute value in the product context.
A unique BERT contextual encoder is adopted and shared across all attributes to encode both the context and the question, which makes the model scalable.
A distilled masked language model with knowledge distillation loss is introduced to improve the model generalization ability. In addition, we employ a no-answer classifier to explicitly handle the cases where there are no values for a given attribute in the product context.
The question answering, distilled masked language model and the no answer classification are then combined into a unified multi-task framework.
We conduct extensive experiments on a public dataset. The results demonstrate that the proposed approach outperforms several state-of-the-art methods with large margin.
View details
ETC: Encoding Long and Structured Inputs in Transformers
Anirudh Ravula
Joshua Ainslie
Li Yang
Qifan Wang
Vaclav Cvicek
2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020)
Preview abstract
Transformer models have advanced the state of the art in many NLP tasks. In this paper, we present a new Transformer architecture, Extended Transformer Construction (ETC), that addresses two key limitations of existing architectures, namely: scaling input length, and ingesting structured inputs. The main innovation is a new global-local attention mechanism between a global memory and the input tokens, which allows scaling attention to longer inputs. We show that combining global-local attention with relative position encodings and a Contrastive Predictive Coding (CPC) pre-training task allows ETC to naturally handle structured data. We achieve new state-of-the-art results on two natural language datasets requiring long and/or structured inputs.
View details