Krishna Srinivasan

Krishna Srinivasan

Krishna is a Research Software Engineer at Google Research. His primary research interests include applications of Machine Learning to problems in Natural Language Processing and Information Retrieval.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Characterizing Image Accessibility on Wikipedia across Languages
    Elisa Kreiss
    Tiziano Piccardi
    Jesus Adolfo Hermosillo
    Michael S. Bernstein
    Meredith Ringel Morris
    Christopher Potts
    Wiki Workshop 2023 (to appear)
    Preview abstract We make a first attempt to characterize image accessibility on Wikipedia across languages, present new experimental results that can inform efforts to assess description quality, and offer some strategies to improve Wikipedia's image accessibility. View details
    Preview abstract Query-document relevance prediction is a critical problem in Information Retrieval systems. This problem has increasingly been tackled using (pretrained) transformer-based models which are finetuned using large collections of labeled data. However, in specialized domains such as e-commerce and healthcare, the viability of this approach is limited by the dearth of large in-domain data. To address this paucity, recent methods leverage these powerful models to generate high-quality task and domain-specific synthetic data. Prior work has largely explored synthetic data generation or query generation (QGen) for Question-Answering (QA) and binary (yes/no) relevance prediction, where for instance, the QGen models are given a document, and trained to generate a query relevant to that document. However in many problems, we have a more fine-grained notion of relevance than a simple yes/no label. Thus, in this work, we conduct a detailed study into how QGen approaches can be leveraged for nuanced relevance prediction. We demonstrate that – contrary to claims from prior works – current QGen approaches fall short of the more conventional cross-domain transfer-learning approaches. Via empirical studies spanning three public e-commerce benchmarks, we identify new shortcomings of existing QGen approaches – including their inability to distinguish between different grades of relevance. To address this, we introduce label-conditioned QGen models which incorporates knowledge about the different relevance. While our experiments demonstrate that these modifications help improve performance of QGen techniques, we also find that QGen approaches struggle to capture the full nuance of the relevance label space and as a result the generated queries are not faithful to the desired relevance label. View details
    Preview abstract Pretrained, large, generative language models (LMs) have had great success in a wide range of sequence tagging and structured prediction tasks. Casting a sequence tagging task as a Seq2Seq problem requires deciding the formats of the input and output sequences. However, we lack a principled understanding of the trade-offs associated with these formats (such as the effect on model accuracy, sequence length, multilingual generalization, hallucination). In this paper, we rigorously study different *formats* one could use for casting input text sentences and their output labels into the "input" and "target" of a Seq2Seq model. Along the way, we introduce a new format, which we show to not only be simpler but also more effective. Additionally the new formats demonstrate significant gains in the multilingual settings -- both zero-shot transfer learning and joint training. Lastly, we find that the new formats are more robust and almost completely devoid of the danger of *hallucination* that often plagues existing formats. With well over a 1000 experiments studying 14 different formats, over 7 diverse public benchmarks -- including 3 multilingual datasets spanning 7 languages -- we believe our findings provide a strong empirical basis in understanding how we should tackle sequence tagging tasks. View details
    Preview abstract Large Language Models (LLMs) have shown impressive results on a variety of text understanding tasks. Search queries though pose a unique challenge, given their short-length and lack of nuance or context. Complicated feature engineering efforts do not always lead to downstream improvements as their performance benefits may be offset by increased complexity of knowledge distillation. Thus, in this paper we make the following contributions: (1) We demonstrate that Retrieval Augmentation of queries provides LLMs with valuable additional context enabling improved understanding. While Retrieval Augmentation typically increases latency of LMs (thus hurting distillation efficacy), (2) we provide a practical and effective way of distilling Retrieval Augmentation LLMs. Specifically, we use a novel two-stage distillation approach that allows us to carry over the gains of retrieval augmentation, without suffering the increased compute typically associated with it. (3) We demonstrate the benefits of the proposed approach on a billion-scale, real-world query understanding system resulting in an X\% improvement. Via extensive experiments, including on public benchmarks, we believe this work offers a recipe for practical use of retrieval-augmented query understanding. View details
    WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning
    Jiecao Chen
    Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '21)(2021)
    Preview abstract The milestone improvements brought about by deep representation learning and pre-training techniques have led to large performance gains across downstream NLP, IR and Vision tasks. Multimodal modeling techniques aim to leverage high-quality visio-linguistic datasets for learning complementary information (across image and text modalities). In this paper, we introduce the Wikipedia-based Image Text (WIT) Dataset to better facilitate multimodal, multilingual learning. WIT is composed of 11 million+ unique images with over 37 million entity rich text descriptions associated with these images in Wikipedia from over 100 languages. Its size enables WIT to be used as a pretraining dataset for multimodal models, as we show when applied to downstream tasks such as image-text retrieval. WIT has four main and unique advantages. First, WIT is the largest multimodal dataset (at the time of writing). Second, it is massively multilingual (first of its kind) with coverage over 100+ languages (each of which has at least 10K examples) and provides cross-lingual texts for many images. Third, it represents a more diverse set of concepts and real world entities relative to what previous datasets cover. Lastly, as we demonstrate empirically, WIT provides a very challenging real-world test set that empirically highlights the need for learning improvements in tasks such as Retrieval and Captioning. View details
    Preview abstract Both image-caption pairs and translation pairs provide the means to learn deep representations of and connections between languages. We use both types of pairs in MURAL (MUltimodal, MUltitask Representations Across Languages), a dual encoder that solves two tasks: 1) image-text matching and 2) translation pair matching. By incorporating billions of translation pairs, MURAL extends ALIGN \cite{jia2021scaling}--a state-of-the-art dual encoder learned from 1.8 billion noisy image-text pairs. When using the same encoders, MURAL's performance matches or exceeds ALIGN's cross-modal retrieval performance on well-resourced languages across several datasets; more importantly, it considerably improves performance on under-resourced languages, showing that text-text learning can overcome a paucity of image-caption examples for these languages. On the Wikipedia Image-Text dataset, for example, MURAL improves zero-shot mean recall by 14.4\% on average for eight under-resourced languages and by 6.6\% on average when fine-tuning. Interestingly, we also find that text representations learned from MURAL cluster based on areal linguistics as well, like the Balkan sprachbund, and not just language genealogy. View details
    Preview abstract Pre-trained multilingual language models such as mBERT have shown immense gains for several natural language processing (NLP) tasks, especially in the zero-shot cross-lingual setting. Most, if not all, of these pre-trained models rely on the masked-language modeling (MLM) objective as the key language learning objective. The principle behind these approaches is that predicting the masked words with the help of the surrounding text helps learn potent contextualized representations. Despite the strong representation learning capability enabled by MLM, we demonstrate an inherent limitation of MLM for multilingual representation learning. In particular, by requiring the model to predict the language-specific token, the MLM objective disincentivizes learning a language-agnostic representation -- which is a key goal of multilingual pre-training. Therefore to encourage better cross-lingual representation learning we propose the DICT-MLM method. DICT-MLM works by incentivizing the model to be able to predict not just the original masked word, but potentially any of its cross-lingual synonyms as well. Our empirical analysis on multiple downstream tasks spanning 30+ languages, demonstrates the efficacy of the proposed approach and its ability to learn better multilingual representations. View details
    No Results Found