Raghav Gupta

Raghav Gupta

I received my master's (2017) and bachelor's (2015) degrees in computer science from Stanford and IIT Bombay respectively. At Google, I work on deep learning for task-oriented dialogue systems, focusing on spoken language understanding, efficient on-device models and zero-shot learning from minimal data.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Conversational recommendation systems (CRS) aim to recommend suitable items to users through natural language conversation. However, most CRS approaches do not effectively utilize the signal provided by these conversations. They rely heavily on explicit external knowledge e.g., knowledge graphs to augment the models' understanding of the items and attributes, which is quite hard to scale. To alleviate this, we propose an alternative information retrieval (IR)-styled approach to the CRS item recommendation task, where we represent conversations as queries and items as documents to be retrieved. We expand the document representation used for retrieval with conversations from the training set. With a simple BM25-based retriever, we show that our task formulation compares favorably with much more complex baselines using complex external knowledge on a popular CRS benchmark. We demonstrate further improvements using user-centric modeling and data augmentation to counter the cold start problem for CRSs. View details
    SGD-X: A Benchmark for Robust Generalization in Schema-Guided Dialogue Systems
    Yuan Cao
    Bin Zhang
    AAAI Conference on Artificial Intelligence, Association for the Advancement of Artificial Intelligence(2022)
    Preview abstract Zero/few-shot transfer to unseen services is a critical challenge in task-oriented dialogue research. The Schema-Guided Dialogue (SGD) dataset introduced a paradigm for enabling models to support any service in zero-shot through schemas, which describe service APIs to models in natural language. We explore the robustness of dialogue systems to linguistic variations in schemas by designing SGD-X - a benchmark extending SGD with semantically similar yet stylistically diverse variants for every schema. We observe that two top state tracking models fail to generalize well across schema variants, measured by joint goal accuracy and a novel metric for measuring schema sensitivity. Additionally, we present a simple model-agnostic data augmentation method to improve schema robustness. View details
    Preview abstract Building universal dialogue systems that operate across multiple domains/APIs and generalize to new ones with minimal overhead is a critical challenge. Recent works have leveraged natural language descriptions of schema elements to enable such systems; however, descriptions only indirectly convey schema semantics. In this work, we propose Show, Don't Tell, which prompts seq2seq models with a labeled example dialogue to show the semantics of schema elements rather than tell the model through descriptions. While requiring similar effort from service developers as generating descriptions, we show that using short examples as schema representations with large language models results in state-of-the-art performance on two popular dialogue state tracking benchmarks designed to measure zero-shot generalization - the Schema-Guided Dialogue dataset and the MultiWOZ leave-one-out benchmark. View details
    Preview abstract Task-oriented dialogue (TOD) systems are required to identify key information from conversations for the completion of given tasks. Such information is conventionally specified in terms of intents and slots contained in task-specific ontology or schemata. Since these schemata are designed by system developers, the naming convention for slots and intents is not uniform across tasks, and may not convey their semantics effectively. This can lead to models memorizing arbitrary patterns in data, resulting in suboptimal performance and generalization. In this paper, we propose that schemata should be modified by replacing names or notations entirely with natural language descriptions. We show that a language description-driven system exhibits better understanding of task specifications, higher performance on state tracking, improved data efficiency, and effective zero-shot transfer to unseen tasks. Following this paradigm, we present a simple yet effective Description-Driven Dialog State Tracking (D3ST) model, which relies purely on schema descriptions and an "index-picking" mechanism. We demonstrate the superiority in quality, data efficiency and robustness of our approach as measured on the MultiWOZ (Budzianowski et al.,2018), SGD (Rastogi et al., 2020), and the recent SGD-X (Lee et al., 2021) benchmarks. View details
    MultiWOZ 2.2 : A Dialogue Dataset with Additional Annotation Corrections and State Tracking Baselines
    Xiaoxue Zang
    Srinivas Sunkara
    Jianguo Zhang
    Proceedings of the 2nd Workshop on Natural Language Processing for Conversational AI (2020), pp. 109-117
    Preview abstract MultiWOZ is a well-known task-oriented dialogue dataset containing over 10,000 annotated dialogues spanning 8 domains. It is extensively used as a benchmark for dialogue state tracking. However, recent works have reported presence of substantial noise in the dialogue state annotations. MultiWOZ 2.1 identified and fixed many of these erroneous annotations and user utterances, resulting in an improved version of this dataset. This work introduces MultiWOZ 2.2, which is a yet another improved version of this dataset. Firstly, we identify and fix dialogue state annotation errors across 17.3% of the utterances on top of MultiWOZ 2.1. Secondly, we redefine the ontology by disallowing vocabularies of slots with a large number of possible values (e.g., restaurant name, time of booking). In addition, we introduce slot span annotations for these slots to standardize them across recent models, which previously used custom string matching heuristics to generate them. We also benchmark a few state of the art dialogue state tracking models on the corrected dataset to facilitate comparison for future work. In the end, we discuss best practices for dialogue data collection that can help avoid annotation errors. View details
    Schema-Guided Dialogue State Tracking Task at DSTC8
    Xiaoxue Zang
    Srinivas Kumar Sunkara
    Pranav Khaitan
    AAAI Dialog System Technology Challenges Workshop(2020) (to appear)
    Preview abstract This paper gives an overview of the Schema-Guided Dialogue State Tracking task of the 8th Dialogue System Technology Challenge. The goal of this task is to develop dialogue state tracking models suitable for large-scale virtual assistants, with a focus on data-efficient joint modeling across domains and zero-shot generalization to new APIs. This task provided a new dataset consisting of over 16000 dialogues in the training set spanning 16 domains to highlight these challenges, and a baseline model capable of zero-shot generalization to new APIs. Twenty-five teams participated, developing a range of neural network models, exceeding the performance of the baseline model by a very high margin. The submissions incorporated a variety of pre-trained encoders and data augmentation techniques. This paper describes the task definition, dataset and evaluation methodology. We also summarize the approach and results of the submitted systems to highlight the overall trends in the state-of-the-art. View details
    Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset
    Xiaoxue Zang
    Srinivas Kumar Sunkara
    Pranav Khaitan
    arXiv preprint arXiv:1909.05855(2019)
    Preview abstract Virtual assistants such as Google Assistant, Alexa and Siri provide a conversational interface to a large number of services and APIs spanning multiple domains. Such systems need to support an ever-increasing number of services with possibly overlapping functionality. Furthermore, some of these services have little to no training data available. Existing public datasets for task-oriented dialogue do not sufficiently capture these challenges since they cover few domains and assume a single static ontology per domain. In this work, we introduce the the Schema-Guided Dialogue (SGD) dataset, containing over 16k multi-domain conversations spanning 16 domains. Our dataset exceeds the existing task-oriented dialogue corpora in scale, while also highlighting the challenges associated with building large-scale virtual assistants. It provides a challenging testbed for a number of tasks including language understanding, slot filling, dialogue state tracking and response generation. Along the same lines, we present a schema-guided paradigm for task-oriented dialogue, in which predictions are made over a dynamic set of intents and slots, provided as input, using their natural language descriptions. This allows a single dialogue system to easily support a large number of services and facilitates simple integration of new services without requiring additional training data. Building upon the proposed paradigm, we release a zero-shot dialogue state tracking model that achieves state-of-the-art performance on recent benchmark datasets. View details
    Robust Zero-Shot Cross-Domain Slot Filling with Example Values
    Darsh J Shah
    Amir Ali Fayazi
    Dilek Hakkani-Tur
    Proceedings of ACL, 2019
    Preview abstract An increasing number of task-oriented dia-logue systems now rely on deep learning-based slot filling models, usually needing largeamounts of labeled training data for the targetdomain. However, often, either little to no tar-get domain training data is available, or thetraining and target domain schemas are mis-aligned, as is common for web forms on simi-lar websites. Prior approaches to zero-shot slotfilling use slot descriptions to learn concepts,which are not robust to misaligned schemas.In this work, we propose utilizing both theslot description and a small number of exam-ples of slot values, which may be easily avail-able, to learn semantic representations of slotswhich are transferable across domains and ro-bust to misaligned schemas. Our experimentsshow improved slot filling performance overstate-of-the-art models on two multi-domaindatasets, in the regular and low-data settings. View details
    Preview abstract This paper presents a novel approach for multi-task learning of language understanding (LU) and dialogue state tracking (DST) in task-oriented dialogue systems. Multi-task training enables the sharing of lower layers of the neural network and improves the performance of LU and DST while reducing the number of network parameters. In our proposed framework, DST operates on a set of candidate values for each slot that has been mentioned so far. These candidate sets are generated using LU slot annotations for the current user utterance, dialogue acts corresponding to the preceding system utterance and the dialogue state estimated for the previous turn, enabling DST to handle slots with a large or unbounded set of possible values and deal with slot values not seen during training. Furthermore, to bridge the gap between training and inference, we investigate the use of scheduled sampling on LU output for the current user utterance as well as the DST output for the preceding turn. View details
    Preview abstract In task-oriented dialogue systems, spoken language understanding, or SLU, refers to the task of parsing natural language user utterances into semantic frames. Making use of context from prior dialogue history holds the key to more effective SLU. State of the art approaches to SLU use memory networks to encode context by processing multiple utterances from the dialogue at each turn, resulting in significant trade-offs between accuracy and computational efficiency. On the other hand, downstream components like the dialogue state tracker (DST) already keep track of the dialogue state, which can serve as a summary of the dialogue history. In this work, we propose novel approaches that use an embedded representation of the dialogue state as context for SLU. More specifically, our architecture includes a separate recurrent neural network (RNN) based encoding module that accumulates dialogue context to guide the frame parsing sub-tasks and can be shared between SLU and DST. In our experiments, we demonstrate the effectiveness of our approach on dialogues from two domains. View details
    A Fast Unified Model for Parsing and Sentence Understanding
    Samuel R. Bowman
    Jon Gauthier
    Christopher D. Manning
    Christopher Potts
    Proceedings of ACL(2016)
    Preview abstract Tree-structured neural networks exploit valuable syntactic parse information as they interpret the meanings of sentences. However, they suffer from two key technical problems that make them slow and unwieldy for large-scale NLP tasks: they usually operate on parsed sentences and they do not directly support batched computation. We address these issues by introducing the Stack-augmented Parser-Interpreter Neural Network (SPINN), which combines parsing and interpretation within a single tree-sequence hybrid model by integrating tree structured sentence interpretation into the linear sequential structure of a shift-reduce parser. Our model supports batched computation for a speedup of up to 25× over other tree-structured models, and its integrated parser can operate on unparsed data with little loss in accuracy. We evaluate it on the Stanford NLI entailment task and show that it significantly outperforms other sentence-encoding models. View details
    Optimal cost almost-sure reachability in POMDPs
    Krishnendu Chatterjee
    Martin Chmelík
    Ayush Kanodia
    AAAI(2015)
    Preview abstract We consider partially observable Markov decision processes (POMDPs) with a set of target states and every transition is associated with an integer cost. The optimization objective we study asks to minimize the expected total cost till the target set is reached, while ensuring that the target set is reached almost-surely (with probability 1). We show that for integer costs approximating the optimal cost is undecidable. For positive costs, our results are as follows: (i) we establish matching lower and upper bounds for the optimal cost and the bound is double exponential; (ii) we show that the problem of approximating the optimal cost is decidable and present approximation algorithms developing on the existing algorithms for POMDPs with finite-horizon objectives. While the worstcase running time of our algorithm is double exponential, we present efficient stopping criteria for the algorithm and show experimentally that it performs well in many examples. View details
    Qualitative analysis of POMDPs with temporal logic specifications for robotics applications
    Krishnendu Chatterjee
    Martin Chmelík
    Ayush Kanodia
    ICRA(2015)
    Preview abstract We consider partially observable Markov decision processes (POMDPs), that are a standard framework for robotics applications to model uncertainties present in the real world, with temporal logic specifications. All temporal logic specifications in linear-time temporal logic (LTL) can be expressed as parity objectives. We study the qualitative analysis problem for POMDPs with parity objectives that asks whether there is a controller (policy) to ensure that the objective holds with probability 1 (almost-surely). While the qualitative analysis of POMDPs with parity objectives is undecidable, recent results show that when restricted to finite-memory policies the problem is EXPTIME-complete. While the problem is intractable in theory, we present a practical approach to solve the qualitative analysis problem. We designed several heuristics to deal with the exponential complexity, and have used our implementation on a number of well-known POMDP examples for robotics applications. Our results provide the first practical approach to solve the qualitative analysis of robot motion planning with LTL properties in the presence of uncertainty. View details
    Biological auctions with multiple rewards
    Johannes G. Reiter
    Ayush Kanodia
    Martin A. Nowak
    Krishnendu Chatterjee
    Proceedings of the Royal Society B: Biological Sciences, 282(2015)
    Preview abstract The competition for resources among cells, individuals or species is a fundamental characteristic of evolution. Biological all-pay auctions have been used to model situations where multiple individuals compete for a single resource. However, in many situations multiple resources with various values exist and single reward auctions are not applicable. We generalize the model to multiple rewards and study the evolution of strategies. In biological all-pay auctions the bid of an individual corresponds to its strategy and is equivalent to its payment in the auction. The decreasingly ordered rewards are distributed according to the decreasingly ordered bids of the participating individuals. The reproductive success of an individual is proportional to its fitness given by the sum of the rewards won minus its payments. Hence, successful bidding strategies spread in the population. We find that the results for the multiple reward case are very different from the single reward case. While the mixed strategy equilibrium in the single reward case with more than two players consists of mostly low-bidding individuals, we show that the equilibrium can convert to many high-bidding individuals and a few low-bidding individuals in the multiple reward case. Some reward values lead to a specialization among the individuals where one subpopulation competes for the rewards and the other subpopulation largely avoids costly competitions. Whether the mixed strategy equilibrium is an evolutionarily stable strategy (ESS) depends on the specific values of the rewards. View details