Jump to Content
Raghav Gupta

Raghav Gupta

I received my master's (2017) and bachelor's (2015) degrees in computer science from Stanford and IIT Bombay respectively. At Google, I work on deep learning for task-oriented dialogue systems, focusing on spoken language understanding, efficient on-device models and zero-shot learning from minimal data.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Conversational recommendation systems (CRS) aim to recommend suitable items to users through natural language conversation. However, most CRS approaches do not effectively utilize the signal provided by these conversations. They rely heavily on explicit external knowledge e.g., knowledge graphs to augment the models' understanding of the items and attributes, which is quite hard to scale. To alleviate this, we propose an alternative information retrieval (IR)-styled approach to the CRS item recommendation task, where we represent conversations as queries and items as documents to be retrieved. We expand the document representation used for retrieval with conversations from the training set. With a simple BM25-based retriever, we show that our task formulation compares favorably with much more complex baselines using complex external knowledge on a popular CRS benchmark. We demonstrate further improvements using user-centric modeling and data augmentation to counter the cold start problem for CRSs. View details
    Preview abstract Task-oriented dialogue (TOD) systems are required to identify key information from conversations for the completion of given tasks. Such information is conventionally specified in terms of intents and slots contained in task-specific ontology or schemata. Since these schemata are designed by system developers, the naming convention for slots and intents is not uniform across tasks, and may not convey their semantics effectively. This can lead to models memorizing arbitrary patterns in data, resulting in suboptimal performance and generalization. In this paper, we propose that schemata should be modified by replacing names or notations entirely with natural language descriptions. We show that a language description-driven system exhibits better understanding of task specifications, higher performance on state tracking, improved data efficiency, and effective zero-shot transfer to unseen tasks. Following this paradigm, we present a simple yet effective Description-Driven Dialog State Tracking (D3ST) model, which relies purely on schema descriptions and an "index-picking" mechanism. We demonstrate the superiority in quality, data efficiency and robustness of our approach as measured on the MultiWOZ (Budzianowski et al.,2018), SGD (Rastogi et al., 2020), and the recent SGD-X (Lee et al., 2021) benchmarks. View details
    SGD-X: A Benchmark for Robust Generalization in Schema-Guided Dialogue Systems
    Bin Zhang
    AAAI Conference on Artificial Intelligence, Association for the Advancement of Artificial Intelligence (2022)
    Preview abstract Zero/few-shot transfer to unseen services is a critical challenge in task-oriented dialogue research. The Schema-Guided Dialogue (SGD) dataset introduced a paradigm for enabling models to support any service in zero-shot through schemas, which describe service APIs to models in natural language. We explore the robustness of dialogue systems to linguistic variations in schemas by designing SGD-X - a benchmark extending SGD with semantically similar yet stylistically diverse variants for every schema. We observe that two top state tracking models fail to generalize well across schema variants, measured by joint goal accuracy and a novel metric for measuring schema sensitivity. Additionally, we present a simple model-agnostic data augmentation method to improve schema robustness. View details
    Preview abstract Building universal dialogue systems that operate across multiple domains/APIs and generalize to new ones with minimal overhead is a critical challenge. Recent works have leveraged natural language descriptions of schema elements to enable such systems; however, descriptions only indirectly convey schema semantics. In this work, we propose Show, Don't Tell, which prompts seq2seq models with a labeled example dialogue to show the semantics of schema elements rather than tell the model through descriptions. While requiring similar effort from service developers as generating descriptions, we show that using short examples as schema representations with large language models results in state-of-the-art performance on two popular dialogue state tracking benchmarks designed to measure zero-shot generalization - the Schema-Guided Dialogue dataset and the MultiWOZ leave-one-out benchmark. View details
    Schema-Guided Dialogue State Tracking Task at DSTC8
    Xiaoxue Zang
    Srinivas Kumar Sunkara
    Pranav Khaitan
    AAAI Dialog System Technology Challenges Workshop (2020) (to appear)
    Preview abstract This paper gives an overview of the Schema-Guided Dialogue State Tracking task of the 8th Dialogue System Technology Challenge. The goal of this task is to develop dialogue state tracking models suitable for large-scale virtual assistants, with a focus on data-efficient joint modeling across domains and zero-shot generalization to new APIs. This task provided a new dataset consisting of over 16000 dialogues in the training set spanning 16 domains to highlight these challenges, and a baseline model capable of zero-shot generalization to new APIs. Twenty-five teams participated, developing a range of neural network models, exceeding the performance of the baseline model by a very high margin. The submissions incorporated a variety of pre-trained encoders and data augmentation techniques. This paper describes the task definition, dataset and evaluation methodology. We also summarize the approach and results of the submitted systems to highlight the overall trends in the state-of-the-art. View details
    MultiWOZ 2.2 : A Dialogue Dataset with Additional Annotation Corrections and State Tracking Baselines
    Xiaoxue Zang
    Srinivas Sunkara
    Jianguo Zhang
    Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI (2020), pp. 109-117
    Preview abstract MultiWOZ is a well-known task-oriented dialogue dataset containing over 10,000 annotated dialogues spanning 8 domains. It is extensively used as a benchmark for dialogue state tracking. However, recent works have reported presence of substantial noise in the dialogue state annotations. MultiWOZ 2.1 identified and fixed many of these erroneous annotations and user utterances, resulting in an improved version of this dataset. This work introduces MultiWOZ 2.2, which is a yet another improved version of this dataset. Firstly, we identify and fix dialogue state annotation errors across 17.3% of the utterances on top of MultiWOZ 2.1. Secondly, we redefine the ontology by disallowing vocabularies of slots with a large number of possible values (e.g., restaurant name, time of booking). In addition, we introduce slot span annotations for these slots to standardize them across recent models, which previously used custom string matching heuristics to generate them. We also benchmark a few state of the art dialogue state tracking models on the corrected dataset to facilitate comparison for future work. In the end, we discuss best practices for dialogue data collection that can help avoid annotation errors. View details
    Robust Zero-Shot Cross-Domain Slot Filling with Example Values
    Darsh J Shah
    Amir Ali Fayazi
    Dilek Hakkani-Tur
    Proceedings of ACL, 2019
    Preview abstract An increasing number of task-oriented dia-logue systems now rely on deep learning-based slot filling models, usually needing largeamounts of labeled training data for the targetdomain. However, often, either little to no tar-get domain training data is available, or thetraining and target domain schemas are mis-aligned, as is common for web forms on simi-lar websites. Prior approaches to zero-shot slotfilling use slot descriptions to learn concepts,which are not robust to misaligned schemas.In this work, we propose utilizing both theslot description and a small number of exam-ples of slot values, which may be easily avail-able, to learn semantic representations of slotswhich are transferable across domains and ro-bust to misaligned schemas. Our experimentsshow improved slot filling performance overstate-of-the-art models on two multi-domaindatasets, in the regular and low-data settings. View details
    Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset
    Xiaoxue Zang
    Srinivas Kumar Sunkara
    Pranav Khaitan
    arXiv preprint arXiv:1909.05855 (2019)
    Preview abstract Virtual assistants such as Google Assistant, Alexa and Siri provide a conversational interface to a large number of services and APIs spanning multiple domains. Such systems need to support an ever-increasing number of services with possibly overlapping functionality. Furthermore, some of these services have little to no training data available. Existing public datasets for task-oriented dialogue do not sufficiently capture these challenges since they cover few domains and assume a single static ontology per domain. In this work, we introduce the the Schema-Guided Dialogue (SGD) dataset, containing over 16k multi-domain conversations spanning 16 domains. Our dataset exceeds the existing task-oriented dialogue corpora in scale, while also highlighting the challenges associated with building large-scale virtual assistants. It provides a challenging testbed for a number of tasks including language understanding, slot filling, dialogue state tracking and response generation. Along the same lines, we present a schema-guided paradigm for task-oriented dialogue, in which predictions are made over a dynamic set of intents and slots, provided as input, using their natural language descriptions. This allows a single dialogue system to easily support a large number of services and facilitates simple integration of new services without requiring additional training data. Building upon the proposed paradigm, we release a zero-shot dialogue state tracking model that achieves state-of-the-art performance on recent benchmark datasets. View details
    Preview abstract This paper presents a novel approach for multi-task learning of language understanding (LU) and dialogue state tracking (DST) in task-oriented dialogue systems. Multi-task training enables the sharing of lower layers of the neural network and improves the performance of LU and DST while reducing the number of network parameters. In our proposed framework, DST operates on a set of candidate values for each slot that has been mentioned so far. These candidate sets are generated using LU slot annotations for the current user utterance, dialogue acts corresponding to the preceding system utterance and the dialogue state estimated for the previous turn, enabling DST to handle slots with a large or unbounded set of possible values and deal with slot values not seen during training. Furthermore, to bridge the gap between training and inference, we investigate the use of scheduled sampling on LU output for the current user utterance as well as the DST output for the preceding turn. View details
    Preview abstract In task-oriented dialogue systems, spoken language understanding, or SLU, refers to the task of parsing natural language user utterances into semantic frames. Making use of context from prior dialogue history holds the key to more effective SLU. State of the art approaches to SLU use memory networks to encode context by processing multiple utterances from the dialogue at each turn, resulting in significant trade-offs between accuracy and computational efficiency. On the other hand, downstream components like the dialogue state tracker (DST) already keep track of the dialogue state, which can serve as a summary of the dialogue history. In this work, we propose novel approaches that use an embedded representation of the dialogue state as context for SLU. More specifically, our architecture includes a separate recurrent neural network (RNN) based encoding module that accumulates dialogue context to guide the frame parsing sub-tasks and can be shared between SLU and DST. In our experiments, we demonstrate the effectiveness of our approach on dialogues from two domains. View details
    A Fast Unified Model for Parsing and Sentence Understanding
    Samuel R. Bowman
    Jon Gauthier
    Christopher D. Manning
    Christopher Potts
    Proceedings of ACL (2016)
    Optimal cost almost-sure reachability in POMDPs
    Krishnendu Chatterjee
    Martin Chmelík
    Ayush Kanodia
    AAAI (2015)
    Qualitative analysis of POMDPs with temporal logic specifications for robotics applications
    Krishnendu Chatterjee
    Martin Chmelík
    Ayush Kanodia
    ICRA (2015)
    Biological auctions with multiple rewards
    Johannes G. Reiter
    Ayush Kanodia
    Martin A. Nowak
    Krishnendu Chatterjee
    Proceedings of the Royal Society B: Biological Sciences, vol. 282 (2015)