Annie Louis
Research Areas
Authored Publications
Sort By
Preview abstract
Transformer encoders contextualize token representations by attending to all other tokens at each layer, leading to quadratic increase in compute effort with the input length. In practice, however, the input text of many NLP tasks can be seen as a sequence of related segments (e.g., the sequence of sentences within a passage, or the hypothesis and premise in NLI). While attending across these segments is highly beneficial for many tasks, we hypothesize that this interaction can be delayed until later encoding stages. To this end, we introduce Layer-adjustable Interactions in Transformers (LAIT). Within LAIT, segmented inputs are first encoded independently, and then jointly. This partial two-tower architecture bridges the gap between a Dual Encoder's ability to pre-compute representations for segments and a fully self-attentive Transformer's capacity to model cross-segment attention. Also, LAIT can be introduced only when finetuning, effectively converting an existing pretrained Transformer into the hybrid of the two aforementioned architectures, and providing an intuitive control over the performance-efficiency tradeoff. Experimenting on a wide range of NLP tasks, we find LAIT to significantly improve efficiency while preserving accuracy.
View details
Conditional Generation with a Question-Answering Blueprint
Reinald Kim Amplayo
Fantine Huot
Mirella Lapata
Transactions of the Association for Computational Linguistics (2023) (to appear)
Preview abstract
The ability to convey relevant and faithful information is critical for many tasks in conditional generation and yet remains elusive for neural seq-to-seq models whose outputs often reveal hallucinations and fail to correctly cover important details. In this work, we advocate planning as a useful intermediate representation for rendering conditional generation less opaque and more grounded. Our work proposes a new conceptualization of text plans as a sequence of question-answer (QA) pairs. We enhance existing datasets (e.g., for summarization) with a QA blueprint operating as a proxy for both content selection (i.e., what to say) and planning (i.e., in what order). We obtain blueprints automatically by exploiting state-of-the-art question generation technology and convert input-output pairs into input-blueprint-output tuples. We develop Transformer-based models, each varying in how they incorporate the blueprint in the generated output (e.g., as a global plan or iteratively). Evaluation across metrics and datasets demonstrates that blueprint models are more factual than alternatives which do not resort to planning and allow tighter control of the generation output.
View details
Preview abstract
A typical product or place often has hundreds of reviews, and summarization of these texts is an important and challenging problem.
Recent progress on abstractive summarization in domains such as news
has been driven by supervised systems trained on hundreds of thousands of
news articles paired with human-written summaries. However for opinion texts, such large scale
datasets are rarely available. Unsupervised methods, self-training, and few-shot learning approaches bridge that gap.
In this work, we present a novel self-training approach, OpineSum for abstractive opinion
summarization. The summaries in this approach are built using a novel application
of textual entailment and capture the consensus of opinions across the various reviews for an item. This method can be used to obtain silver-standard summaries on a large scale and train both
unsupervised and few-shot abstractive summarization systems. OpineSum achieves state-of-the-art performance in both settings.
View details
Resolving Indirect Referring Expressions for Entity Selection
Silvia Pareti
Proceedings of the Annual Meetings of the Association for Computational Linguistics (ACL 2023)
Preview abstract
Recent advances in language modeling have enabled new conversational systems. In particular, it is often desirable for people to make choices among specified options when using such systems. We address the problem of reference resolution, when people use natural expressions to choose between real world entities. For example, given the choice `Should we make a Simnel cake or a Pandan cake?' a natural response from a non-expert may be indirect: `let's make the green one'. Such natural expressions have been little studied for reference resolution. We argue that robustly understanding such language has large potential for improving naturalness in dialog, recommendation, and search systems. We create AltEntities (Alternative Entities), a new public dataset of 42K entity pairs and expressions (referring to one entity in the pair), and develop models for the disambiguation problem. Consisting of indirect referring expressions across three domains, our corpus enables for the first time the study of how language models can be adapted to this task. We find they achieve 82%-87% accuracy in realistic settings, which while reasonable also invites further advances.
View details
Text-Blueprint: An Interactive Platform for Plan-based Conditional Generation
Fantine Huot
Reinald Kim Amplayo
Mirella Lapata
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations (2023)
Preview abstract
While conditional generation models can now generate natural language well enough to create fluent text, it is still difficult to control the generation process, leading to irrelevant, repetitive, and hallucinated content. Recent work shows that planning can be a useful intermediate step to render conditional generation less opaque and more grounded. We present a web browser-based demonstration for query-focused summarization that uses a sequence of question-answer pairs, as a blueprint plan for guiding text generation (i.e., what to say and in what order). We illustrate how users may interact with the generated text and associated plan visualizations, e.g., by editing and modifying the blueprint in order to improve or control the generated output.
View details
Preview abstract
In a discourse, specific entities that are mentioned can later be referred to by a more general description. For example, 'Celine Dion' and 'Justin Bieber' can be referred to by 'Canadian singers' or 'celebrities'. In this work, we study this phenomenon in the context of summarization, where entities drawn from a source text are generalized in the summary. We call such instances 'source-summary entity aggregations'. We categorize and study several types of source-summary entity aggregations in the CNN/Dailymail corpus, showing that they are reasonably frequent. We experimentally analyze the capabilities of three state-of-the-art summarization systems for generating such aggregations within summaries. We also explore how they can be encouraged to generate more aggregations. Our results show that there is significant room for improvement in generating semantically correct and appropriate aggregations.
View details
"I’d rather just go to bed”: Understanding Indirect Answers
Dan Roth
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP) (2020)
Preview abstract
We revisit a pragmatic inference problem in dialog: Understanding indirect responses to questions. Humans can interpret `I'm starving.' in response to `Hungry?', even without direct cue words such as `yes' and `no'. In dialog systems, allowing natural responses rather than closed vocabularies would be similarly beneficial. However, today's systems are only as sensitive to these pragmatic moves as their language model allows. We create and release the first large-scale English language corpus `Circa' with 34,268 (polar question, indirect answer) pairs to enable progress on this task. The data was collected via elaborate crowd-sourcing, and contains utterances with yes/no meaning, as well as uncertain, middle-ground, and conditional responses. We also present BERT-based neural models to
predict such categories for a question-answer pair. We find that while transfer learning from entailment works reasonably, performance is not yet sufficient for robust dialog. Our models reach 82-88% accuracy for a 4-class distinction, and 74-85% for 6 classes.
View details
TESA: A Task in Entity Semantic Aggregation for Abstractive Summarization
Clement Jumel
Jackie C. K. Cheung
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Online, pp. 8031-8050
Preview abstract
Human-written texts contain frequent generalizations and semantic aggregation of content. In a document, they may refer to a pair of named entities such as ‘London’ and ‘Paris’ with different expressions: “the major cities”, “the capital cities” and “two European cities”. Yet generation, especially, abstractive summarization systems have so far focused heavily on paraphrasing and simplifying the source content, to the exclusion of such semantic abstraction capabilities. In this paper, we present a new dataset and task aimed at the semantic aggregation of entities. TESA contains a dataset of 5.3K crowd-sourced entity aggregations of Person, Organization, and Location named entities. The aggregations are document-appropriate, meaning that they are produced by annotators to match the situational context of a given news article from the New York Times. We then build baseline models for generating aggregations given a tuple of entities and document context. We finetune on TESA an encoder-decoder language model and compare it with simpler classification methods based on linguistically informed features. Our quantitative and qualitative evaluations show reasonable performance in making a choice from a given list of expressions, but free-form expressions are understandably harder to generate and evaluate.
View details
Where should I comment my code? A dataset and model for predicting locations which need comments
Earl T. Barr
Michael Ernst
Santanu Dash
International Conference on Software Engineering (ICSE) (2020)
Preview abstract
It is important to write code comments. Programmers should not
comment every line of code: doing so would clutter the code, and
programmers do not have time to do so in any event. Programmers
must judiciously decide where to write code comments.
We have created a machine learning model that suggests locations where a programmer should write a code comment. We
trained it on existing high quality commented code to learn locations chosen which are chosen by developers. Once trained, the
model can predict locations on new code. We find that our models
can achieve good accuracy on this task but there is a lot of scope
for future improvements.
View details