Jump to Content
Manaal Faruqui

Manaal Faruqui

Manaal Faruqui is a Research Scientist at Google working on industry-scale natural language processing problems. He received a Ph.D. in 2016 from the Language Technologies Institute, School of Computer Science at Carnegie Mellon University. Before that, he completed an undergraduate degree in Computer Science and Engineering in 2012 from the Indian Institute of Technology, Kharagpur. His work on using semantic lexicons to improve word embeddings won a best paper award at NAACL 2015.

Here is his Google Scholar Profile.

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract A baseline method of running the bidirectional models like BERT in streaming NLU text setting would be to run it again for each new (sub)token received. Here, no previously computed features are re-used and a restart is done from scratch at each timestep for the newly received token with the new prefix. This lead to computational inefficiency (measured as FLOP Count with lower count being better). \name~ addresses this issue by reducing the FLOP Count of having bidirectional features for streaming setting and also improves the performance or generalization to incomplete inputs (partials). \name~ has two components - a partially bidirectional encoder model and an adapter to guide the restarts of bidirectional layer. Our evaluations showed that these gains are observed while maintaining a similar performance over the complete input over 4 sequence tagging datasets. View details
    Preview abstract As more users across the world are interacting with dialog agents in their daily life, it calls for a renewed attention to the dynamics between research in automatic speech recognition (ASR) and natural language understanding (NLU). We briefly review these research areas and lay out the current relationship between them. In light of the observations we make in this paper we argue that (1) NLU should be congnizant of the presence of ASR models being used upstream in a dialog system's pipeline, (2) ASR should be able to learn from errors found in NLU, (3) there is a need for end-to-end datasets that provide semantic annotations on spoken input, (4) there should be stronger collaboration between ASR and NLU research communities. View details
    TimeDial: Temporal Commonsense Reasoning in Dialog
    Lianhui Qin
    Aditya Gupta
    Yejin Choi
    Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics (2021)
    Preview abstract Everyday conversations require understanding everyday events, which in turn, requires understanding temporal commonsense concepts interwoven with those events. Despite recent progress with massive pre-trained language models (LMs) such as T5 and GPT-3, their capability of temporal reasoning in dialogs remains largely under-explored. In this paper, we present the first study to investigate pre-trained LMs for their temporal reasoning capabilities in dialogs by introducing a new task and a crowd-sourced English challenge set, TimeDial. We formulate TimeDial as a multiple choice cloze task with over 1.1K carefully curated dialogs. Empirical results demonstrate that even the best performing models struggle on this task compared to humans, with 23 absolute points of gap in accuracy. Furthermore, our analysis reveals that the models fail to reason about dialog context correctly; instead, they rely on shallow cues based on existing temporal patterns in context, motivating future research for modeling temporal concepts in text and robust contextual reasoning about them. The dataset is publicly available at https://github.com/google-research-datasets/timedial. View details
    Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question Answering
    Aditya Gupta
    Jiacheng Xu
    Diyi Yang
    Findings of the Association for Computational Linguistics: ACL 2021, Association for Computational Linguistics
    Preview abstract Disfluencies is an under-studied topic in NLP, even though it is ubiquitous in human conversation. This is largely due to the lack of datasets containing disfluencies. In this paper, we present a new challenge question answering dataset, Disfl-QA, a derivative of SQuAD, where humans introduce contextual disfluencies in previously fluent questions. Disfl-QA contains a variety of challenging disfluencies that require a more comprehensive understanding of the text than what was necessary in prior datasets. Experiments show that the performance of existing state-of-the-art question answering models degrades significantly when tested on Disfl-QA in a zero-shot setting. We show data augmentation methods partially recover the loss in performance and also demonstrate the efficacy of using gold data for fine-tuning. We argue that we need large-scale disfluency datasets in order for NLP models to be robust to them. The dataset is publicly available at: https://github.com/google-research-datasets/disfl-qa. View details
    Preview abstract We present ToTTo, an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: given a Wikipedia table and a set of highlighted table cells, produce a one-sentence description. To obtain generated targets that are natural but also faithful to the source table, we introduce a dataset construction process where annotators directly revise existing candidate sentences from Wikipedia. We present systematic analyses of our dataset and annotation process as well as results achieved by several state-of-the-art baselines. While usually fluent, existing methods often hallucinate phrases that are not supported by the table, suggesting that this dataset can serve as a useful research benchmark for high-precision conditional text generation. View details
    Preview abstract We present a large-scale dataset for the task of rewriting an ill-formed natural language question to a well-formed one. Our multi-domain question rewriting (MQR) dataset is constructed from human contributed Stack Exchange question edit histories. The dataset contains 427,719 question pairs which come from 303 domains. We provide human annotations for a subset of the dataset as a quality estimate. When moving from ill-formed to well-formed questions, the question quality improves by an average of 45 points across three aspects. We train sequence-to-sequence neural models on the constructed dataset and obtain an improvement of 13.2% in BLEU-4 over baseline methods built from other data resources. View details
    Preview abstract Automatically constructed datasets for generating text from semi-structured data (tables), such as WikiBio, often contain reference texts that diverge from the information in the corresponding semi-structured data. We show that metrics which rely solely on the reference texts, such as BLEU and ROUGE, show poor correlation with human judgments when those references diverge. We propose a new metric, PARENT, which aligns n-grams from the reference and generated texts to the semi-structured data before computing their precision and recall. Through a large scale human evaluation study of table-to-text models for WikiBio, we show that PARENT correlates with human judgments better than existing text generation metrics. We also adapt and evaluate the information extraction based evaluation proposed by Wiseman et al (2017), and show that PARENT has comparable correlation to it, while being easier to use. We show that PARENT is also applicable when the reference texts are elicited from humans using the data from the WebNLG challenge. View details
    Preview abstract We propose a novel conditioned text generation model. It draws inspiration from traditional template-based text generation techniques, where the source provides the content (i.e.,what to say), and the template influences how to say it. Building on the successful encoder-decoder paradigm, it first encodes the content representation from the given in-put text; to produce the output, it retrieves exemplar text from the training data as “soft templates,” which are then used to construct an exemplar-specific decoder. We evaluate the proposed model on abstractive text summarization and data-to-text generation. Empirical results show that this model achieves strong performance and outperforms comparable baselines. View details
    Preview abstract Split and rephrase is the task of breaking down a sentence into shorter ones that together convey the same meaning. We extract a rich new dataset for this task by mining Wikipedia's edit history: WikiSplit contains one million naturally occurring sentence rewrites, providing sixty times more distinct split examples and a ninety times larger vocabulary than the WebSplit corpus introduced by Narayan et al. (2017) as a benchmark for this task. Incorporating WikiSplit as training data produces a model with qualitatively better predictions that score 32 BLEU points above the prior best result on the WebSplit benchmark. View details
    Preview abstract We release a corpus of atomic insertion ed-its: instances in which a human editor has inserted a single contiguous span of text into an existing sentence. Our corpus is derived fromWikipedia edit history and contains 43 million sentences across 8 different languages. We argue that the signal contained in these edits is valuable for research in semantics and dis-course, and that such signal differs from that found in conventional language modeling corpora. We provide experimental evidence from both a corpus linguistics and a language modeling perspective to support these claims. View details
    Preview abstract Understanding natural language queries is fundamental to many practical NLP systems. Often, such systems comprise of a brittle processing pipeline, that is not robust to "word salad" text ubiquitously issued by users. However, if a query resembles a grammatical and well-formed question, such a pipeline is able to perform more accurate interpretation, thus reducing downstream compounding errors. Hence, identifying whether or not a query is well formed can enhance query understanding. Here, we introduce a new task of identifying a well-formed natural language question. We construct and release a dataset of 25,100 publicly available questions classified into well-formed and non-well-formed categories and report an accuracy of 70.7% on the test set. We also show that our classifier can be used to improve the performance of neural sequence-to-sequence model for generating questions for reading comprehension. View details
    GHH at SemEval-2018 Task 10: Discovering Discriminative Attributes in Distributional Semantics
    Younes Samih
    Wolfgang Maier
    SemEval 2018 Task 10 on Capturing Discriminative Attributes (2018)
    Preview abstract This paper describes our system submission to the SemEval 2018 Task 10 on Capturing Discriminative Attributes. Given two concepts and an attribute, the task is to determine whether the attribute is semantically related to one concept and not the other. In this work we assume that discriminative attributes can be detected by discovering the association (or lack of association) between a pair of words. The hypothesis we test in this contribution is whether the semantic difference between two pairs of concepts can be treated in terms of measuring the distance between words in a vector space, or can simply be obtained as a by-product of word co-occurrence counts. View details
    (Almost) Zero-Shot Cross-Lingual Spoken Language Understanding
    Gokhan Tur
    Dilek Hakkani-Tur
    Larry Heck
    Proceedings of the IEEE ICASSP (2018)
    Preview abstract Spoken language understanding (SLU) is a component of goal-oriented dialogue systems that aims to interpret user's natural language queries in system's semantic representation format. While current state-of-the-art SLU approaches achieve high performance for English domains, the same is not true for other languages. Approaches in the literature for extending SLU models and grammars to new languages rely primarily on machine translation. This poses a challenge in scaling to new languages, as machine translation systems may not be reliable for several (especially low resource) languages. In this work, we examine different approaches to train a SLU component with little supervision for two new languages -- Hindi and Turkish, and show that with only a few hundred labeled examples we can surpass the approaches proposed in the literature. Our experiments show that training a model bilingually (i.e., jointly with English), enables faster learning, in that the model requires fewer labeled instances in the target language to generalize. Qualitative analysis shows that rare slot types benefit the most from the bilingual training. View details
    No Results Found