Kishore Papineni

Kishore Papineni

Kishore leads the Coauthor team whose objective is cross-lingual cross-modal access to dynamically organized information. His team hopes to make content consumption or creation a richer experience by surfacing relevant and diverse information from the web, possibly synthesized dynamically from across different sources or types of content such as text, images, charts, and videos. Coauthor team powers the web content suggestions in Google Docs when users are writing a document, and is working on additional content recommendation applications. His work at Google includes veracity of information on the web, depth of discourse on a topic in a document, drift of discourse on a topic on the web, identifying concepts peculiar to a collection of documents and relationships among the concepts, and identifying different perspectives in content. His past work was in the areas of automatic control theory, natural language understanding, dialog management, machine translation, and display advertisements. Prior to joining Google, he led machine learning at Yahoo! Research and machine translation at IBM Research. He is a coauthor of the BLEU metric for automatic evaluation of machine translation quality (awarded 2018 Test-of-Time Paper on Computational Linguistics). He was a founding Editor-in-Chief of ACM Transactions on Speech and Language Processing from 2003-2007.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines
    Yuchen Li
    Alexandre Kirchmeyer
    Aashay Mehta
    Yilong Qin
    Andrej Risteski
    International Conference on Machine Learning(2024) (to appear)
    Preview abstract Autoregressive language models are the currently dominant paradigm for text generation, however they have some fundamental limitations that cannot be remedied by scale ---for example inherently sequential and unidirectional generation. While alternate classes of models have been explored, we have limited mathematical understanding of their fundamental power and limitations. In this paper we focus on Generative Masked Language Models (GMLMs), a non-autoregressive paradigm in which we train a model to fit conditional probabilities of the data distribution via masking, which are subsequently used as inputs to a Markov Chain to draw samples from the model. These models empirically strike a promising speed-quality trade-off as each step can be typically parallelized by decoding the entire sequence in parallel. We develop a mathematical framework for analyzing and improving such models which sheds light on questions of sample complexity and inference speed and quality. Empirically, we adapt the T5 model for iteratively-refined parallel decoding, achieving 2-3x speedup in machine translation with minimal sacrifice in quality compared with autoregressive models. We run careful ablation experiments to give recommendations on key design choices, and make fine-grained observations on the common error modes in connection with our theory. Our mathematical analyses and empirical observations characterize both potentials and limitations of this approach, and can be applied to future works on improving understanding and performance of GMLMs. View details
    Preview abstract It is generally believed that robust training of extremely large networks is critical to their success in real-world applications. However, when taken to the extreme, methods that promote robustness can hurt the model's sensitivity to rare or underrepresented patterns. In this paper, we discuss this trade-off between sensitivity and robustness to natural (non-adversarial) perturbations by introducing two notions: contextual feature utility and contextual feature sensitivity. We propose Feature Contrastive Learning (FCL) that encourages a model to be more sensitive to the features that have higher contextual utility. Empirical results demonstrate that models trained with FCL achieve a better balance of robustness and sensitivity, leading to improved generalization in the presence of noise on both vision and NLP datasets. View details
    Preview abstract Document and discourse segmentation are two fundamental NLP tasks pertaining to breaking up text into constituents, which are commonly used to help downstream tasks such as information retrieval or text summarization. In this work, we propose three transformer-based architectures and provide comprehensive comparisons with previously proposed approaches on three standard datasets. We establish a new state-of-the-art, reducing in particular the error rates by a large margin in all cases. We further analyze model sizes and find that we can build models with many fewer parameters while keeping good performance, thus facilitating real-world applications. View details
    Maximally representative allocations for guaranteed delivery advertising campaigns
    R. Preston McAfee
    Review of Economic Design, 17(2013), pp. 83-94
    Preview abstract There are around 400 advertising networks that match opportunities for “display” advertising, which include banner ads, video ads and indeed all ads other than text-based ads, on web pages and candidate advertisements. This is about a $25 billion business annually. The present study derives a method of pricing such advertisements based on their relative scarcity while ensuring that all campaigns obtain a reasonably representative sample of the relevant opportunities. The mechanism is well-behaved under supply uncertainty. A method based on the mechanism described in this paper was implemented by Yahoo! Inc. View details
    Bidding for Representative Allocations for Display Advertising
    Arpita Ghosh
    Randolph Preston McAfee
    WINE(2009), pp. 208-219
    Bidding for Representative Allocations for Display Advertising
    Arpita Ghosh
    Randolph Preston McAfee
    CoRR, abs/0910.0880(2009)