Skip to main content

Explore our many areas of focus

Explore all research areas

Applied AI & sciences

Sustainability & crisis resilience

Foundational ML & algorithms

Algorithms & theory

Information retrieval

Machine intelligence

Machine perception

Natural language processing

People, systems & quantum AI

Human-computer interaction and visualization

Software engineering

Software systems

Learn More

Building a collaborative ecosystem

Access high-quality datasets to accelerate your research.

Tools & services

Explore our latest AI models and products.

Discover open-source code and collaborate with the community.

Shaping the future together

See all programs

Faculty programs

Participating in the academic research community through meaningful engagement with university faculty.

Student programs

Supporting the next generation of researchers through a wide range of programming.

Find your place in our global offices and research labs.

Translating discovery into real-world impact

Our researchers drive advancements in computer science through both fundamental and applied research.

Collaborative groups tackling the world's most challenging AI problems.

Research

Explore our many areas of focus

Explore all research areas

Applied AI & sciences

Sustainability & crisis resilience

Foundational ML & algorithms

Algorithms & theory

Information retrieval

Machine intelligence

Machine perception

Natural language processing

People, systems & quantum AI

Human-computer interaction and visualization

Software engineering

Software systems

Learn More

Resources

Building a collaborative ecosystem

Access high-quality datasets to accelerate your research.

Tools & services

Explore our latest AI models and products.

Discover open-source code and collaborate with the community.

Conferences & events

Careers

Shaping the future together

See all programs

Faculty programs

Participating in the academic research community through meaningful engagement with university faculty.

Student programs

Supporting the next generation of researchers through a wide range of programming.

Find your place in our global offices and research labs.

Blog

About

Translating discovery into real-world impact

Our researchers drive advancements in computer science through both fundamental and applied research.

Collaborative groups tackling the world's most challenging AI problems.

Google Research

Learn about all our AI

Google DeepMind

Explore the frontier of AI

Try our AI experiments

Conferences & events

Blog

Dipanjan Das

Home
People

Dipanjan Das

Dipanjan Das is a Research Scientist at Google working on learning semantic representations of language. He received a Ph.D. in 2012 from the Language Technologies Institute, School of Computer Science at Carnegie Mellon University. Before that, he completed an undergraduate degree in Computer Science and Engineering in 2005 from the Indian Institute of Technology, Kharagpur. His work on multilingual learning of sequence models received the best paper award at ACL 2011 and a best paper award honorable mention at EMNLP 2013.

See his personal webpage for more information.

Research Areas

Machine intelligence
Natural language processing

Authored Publications

results

Filter by:

Publications

Google 34
Other 6

Years

2023 4
2022 5
2021 1
2020 2
2019 4
2018 3
2017 2
2016 2
2015 2
2014 4
2013 3
2012 4
2011 4

Research Areas

Data Mining and Modeling 1
Machine Intelligence 11
Natural Language Processing 39

Teams

Athena 4
I-DRIM 2
Language 7

Sort By

Title
Title, descending
Year
Year, descending

chip template

Measuring Attribution in Natural Language Generation Models

Hannah Rashkin

Vitaly Nikolaev

Matthew Lamm

Lora Aroyo

Michael Collins

Dipanjan Das

Slav Petrov

Gaurav Singh Tomar

Iulia Turc

David Reitter

Computational Linguistics, 49 (2023), pp. 777-840

Preview abstract With recent improvements in natural language generation (NLG) models for various applications, it has become imperative to have the means to identify and evaluate whether NLG output is only sharing verifiable information about the external world. In this work, we present a new evaluation framework entitled Attributable to Identified Sources (AIS) for assessing the output of natural language generation models, when such output pertains to the external world. We first define AIS and introduce a two-stage annotation pipeline for allowing annotators to appropriately evaluate model output according to AIS guidelines. We empirically validate this approach on generation datasets spanning three tasks (two conversational QA datasets, a summarization dataset, and a table-to-text dataset) via human evaluation studies that suggest that AIS could serve as a common framework for measuring whether model-generated statements are supported by underlying sources. We release guidelines for the human evaluation studies. View details

Text-Blueprint: An Interactive Platform for Plan-based Conditional Generation

Fantine Huot

Joshua Maynez

Shashi Narayan

Reinald Kim Amplayo

Kuzman Ganchev

Annie Louis

Anders Sandholm

Dipanjan Das

Mirella Lapata

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations (2023)

Preview abstract While conditional generation models can now generate natural language well enough to create fluent text, it is still difficult to control the generation process, leading to irrelevant, repetitive, and hallucinated content. Recent work shows that planning can be a useful intermediate step to render conditional generation less opaque and more grounded. We present a web browser-based demonstration for query-focused summarization that uses a sequence of question-answer pairs, as a blueprint plan for guiding text generation (i.e., what to say and in what order). We illustrate how users may interact with the generated text and associated plan visualizations, e.g., by editing and modifying the blueprint in order to improve or control the generated output. View details

SEAHORSE: A Dataset of Summaries Annotated with Human Ratings in Six Languages

Elizabeth Clark

Shruti Rijhwani

Sebastian Gehrmann

Joshua Maynez

Roee Aharoni

Vitaly Nikolaev

Thibault Sellam

Aditya Siddhant

Dipanjan Das

Ankur Parikh

EMNLP 2023, Association for Computational Linguistics (2023)

Preview abstract We introduce Seahorse (SummariEs Annotated with Human Ratings in Six languagEs), a dataset of 96K summaries with ratings along 6 dimensions (comprehensibility, repetition, grammar, attribution, main idea(s), and conciseness). The summaries are generated from 8 different models, conditioned on source text from 4 datasets in 6 languages (German, English, Spanish, Russian, Turkish, and Vietnamese). We release the annotated summaries as a resource for developing better summarization models and automatic metrics. We present an analysis of the dataset's composition and quality, and we demonstrate the potential of this dataset for building better summarization metrics, showing that metrics finetuned with Seahorse data outperform baseline metrics. View details

Conditional Generation with a Question-Answering Blueprint

Shashi Narayan

Joshua Maynez

Reinald Kim Amplayo

Kuzman Ganchev

Annie Louis

Fantine Huot

Anders Sandholm

Dipanjan Das

Mirella Lapata

Transactions of the Association for Computational Linguistics (2023) (to appear)

Preview abstract The ability to convey relevant and faithful information is critical for many tasks in conditional generation and yet remains elusive for neural seq-to-seq models whose outputs often reveal hallucinations and fail to correctly cover important details. In this work, we advocate planning as a useful intermediate representation for rendering conditional generation less opaque and more grounded. Our work proposes a new conceptualization of text plans as a sequence of question-answer (QA) pairs. We enhance existing datasets (e.g., for summarization) with a QA blueprint operating as a proxy for both content selection (i.e., what to say) and planning (i.e., in what order). We obtain blueprints automatically by exploiting state-of-the-art question generation technology and convert input-output pairs into input-blueprint-output tuples. We develop Transformer-based models, each varying in how they incorporate the blueprint in the generated output (e.g., as a global plan or iteratively). Evaluation across metrics and datasets demonstrates that blueprint models are more factual than alternatives which do not resort to planning and allow tighter control of the generation output. View details

Query Refinement Prompts for Closed-Book Long-Form Question Answering

Dipanjan Das

Kellie Webster

Michael Collins

Reinald Kim Amplayo

Shashi Narayan

arXiv submission (2022)

Preview abstract Large language models (LLMs) have been shown to perform well in answering questions and in producing long-form texts such as stories and explanations, both in few-shot closed-book settings. While the former can be validated using well-known evaluation metrics, the latter is difficult to evaluate. To this end, we investigate the ability of LLMs to do both tasks at once -- to do question answering that requires long-form answers. Such questions tend to be multifaceted, i.e., they may have ambiguities and/or require information from multiple sources. To this end, we define query refinement prompts that encourage LLMs to explicitly express the multifacetedness in questions and generate long-form answers covering multiple facets of the question. Our experiments on two long-form question answering datasets, ASQA and AQuAMuSe, show that using our prompts allows us to outperform fully finetuned models in the closed book setting, as well as achieve results comparable to retrieve-then-generate open-book models. View details

QAmeleon: Multilingual QA with Only 5 Examples

Priyanka Agrawal

Chris Alberti

Fantine Huot

Joshua Maynez

Ji Ma

Sebastian Ruder

Kuzman Ganchev

Dipanjan Das

Mirella Lapata

Arxiv (2022)

Preview abstract The availability of large, high-quality datasets has been one of the main drivers of recent progress in question answering (QA). Such annotated datasets however are difficult and costly to collect, and rarely exist in languages other than English, rendering QA technology inaccessible to underrepresented languages. An alternative to building large monolingual training datasets is to leverage pre-trained language models (PLMs) under a few-shot learning setting. Our approach, QAmeleon, uses a PLM to automatically generate multilingual data upon which QA models are trained, thus avoiding costly annotation. Prompt tuning the PLM for data synthesis with only five examples per language delivers accuracy superior to translation-based baselines, bridges nearly 60% of the gap between an English-only baseline and a fully supervised upper bound trained on almost 50,000 hand labeled examples, and always leads to substantial improvements compared to fine-tuning a QA model directly on labeled examples in low resource settings. Experiments on the TyDiQA-GoldP and MLQA benchmarks show that few-shot prompt tuning for data synthesis scales across languages and is a viable alternative to large-scale annotation. View details

Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models

Bernd Bohnet

Vinh Tran

Pat Verga

Roee Aharoni

Daniel Andor

Livio Baldini Soares

Massimiliano Ciaramita

Jacob Eisenstein

Kuzman Ganchev

Jonathan Herzig

Kai Hui

Tom Kwiatkowski

Ji Ma

Jianmo Ni

Tal Schuster

Lierni Sestorain Saralegui

William Weston Cohen

Michael Collins

Dipanjan Das

Don Metzler

Slav Petrov

Kellie Webster

arXiv (2022)

Preview abstract Large language models (LLMs) have shown impressive results across a variety of tasks while requiring little or no direct supervision. Further, there is mounting evidence that LLMs may have potential in information-seeking scenarios. We believe the ability of an LLM to attribute the text that it generates is likely to be crucial for both system developers and users in this setting. We propose and study Attributed QA as a key first step in the development of attributed LLMs. We develop a reproducable evaluation framework for the task, using human annotations as a gold standard and a correlated automatic metric that we show is suitable for development settings. We describe and benchmark a broad set of architectures for the task. Our contributions give some concrete answers to two key questions (How to measure attribution?, and How well do current state-of-the-art methods perform on attribution?), and give some hints as to how to address a third key question (How to build LLMs with attribution?). View details

A Well-Composed Text is Half Done! Composition Sampling for Diverse Conditional Generation

Shashi Narayan

Gonçalo Simões

Yao Zhao

Joshua Maynez

Dipanjan Das

Michael Collins

Mirella Lapata

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022), Association for Computational Linguistics, pp. 21

Preview abstract We propose Composition Sampling, a simple but effective method to generate diverse outputs for conditional generation of higher quality compared to previous stochastic decoding strategies. It builds on recently proposed plan-based neural generation models (Narayan et al., 2021) that are trained to first create a composition of the output and then generate by conditioning on it and the input. Our approach avoids text degeneration by first sampling a composition in the form of an entity chain and then using beam search to generate the best possible text grounded to this entity chain. Experiments on summarization (CNN/DailyMail and XSum) and question generation (SQuAD), using existing and newly proposed automatic metrics together with human-based evaluation, demonstrate that Composition Sampling is currently the best available decoding strategy for generating diverse meaningful outputs. View details

The MultiBERTs: BERT Reproductions for Robustness Analysis

Thibault Sellam

Steve Yadlowsky

Ian Tenney

Jason Wei

Naomi Saphra

Alexander Nicholas D'Amour

Tal Linzen

Jasmijn Bastings

Iulia Raluca Turc

Jacob Eisenstein

Dipanjan Das

Ellie Pavlick

2022

Preview abstract Experiments with pretrained models such as BERT are often based on a single checkpoint. While the conclusions drawn apply to the artifact (i.e., the particular instance of the model), it is not always clear whether they hold for the more general procedure (which includes the model architecture, training data, initialization scheme, and loss function). Recent work has shown that re-running pretraining can lead to substantially different conclusions about performance, suggesting that alternative evaluations are needed to make principled statements about procedures. To address this question, we introduce MultiBERTs: a set of 25 BERT-base checkpoints, trained with similar hyper-parameters as the original BERT model but differing in random initialization and data shuffling. The aim is to enable researchers to draw robust and statistically justified conclusions about pretraining procedures. The full release includes 25 fully trained checkpoints, as well as statistical guidelines and a code library implementing our recommended hypothesis testing methods. Finally, for five of these models we release a set of 28 intermediate checkpoints in order to support research on learning dynamics. View details

Increasing Faithfulness in Knowledge-Grounded Dialogue with Controllable Features

Hannah Jane Rashkin

David Reitter

Gaurav Singh Tomar

Dipanjan Das

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (2021), pp. 704-718

Preview abstract Knowledge-grounded dialogue systems are intended to convey information that is based on evidence provided in a given source text. We discuss the challenges of training a generative neural dialogue model for such systems that is controlled to stay faithful to the evidence. Existing datasets contain a mix of conversational responses that are faithful to selected evidence as well as more subjective or chit-chat style responses. We propose different evaluation measures to disentangle these different styles of responses by quantifying the informativeness and objectivity. At training time, additional inputs based on these evaluation measures are given to the dialogue model. At generation time, these additional inputs act as stylistic controls that encourage the model to generate responses that are faithful to the provided evidence. We also investigate the usage of additional controls at decoding time using resampling techniques. In addition to automatic metrics, we perform a human evaluation study where raters judge the output of these controlled generation models to be generally more objective and faithful to the evidence compared to baseline dialogue systems. View details

1
2
3
…

of 4

of 4 pages

Search on Google Scholar

Join us

We're always looking for more talented, passionate people.

See opportunities

Follow us

Explore our other initiatives

Google AI

Discover how Google AI is committed to enriching knowledge and solving complex challenges

Products
Build
Research
Responsibility
Societal Impact
About

Google Cloud

High-performance infrastructure for cloud computing, data analytics & machine learning

Overview
Solutions
Products
Pricing
Resources

Google DeepMind

Our mission is to build AI responsibly to benefit humanity

Models
Research
Science
About

Google Labs

Explore the future of AI responsibly with Google Labs

About
Experiments
Stay connected

Google Products

×