Aliaksei Severyn

Aliaksei Severyn

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Text-editing models have recently become a prominent alternative to seq2seq models for monolingual natural language generation (NLG) tasks such as grammatical error correction, text simplification, and style transfer. These tasks exhibit a large amount of textual overlap between the source and target texts. Text-editing models take advantage of this trait and learn to generate the output by predicting edit operations applied to the source sequence in contrast to seq2seq models that generate the output from scratch. Text-editing models provide several benefits over seq2seq models including faster inference speed, higher sample efficiency, and better control and interpretability of the outputs. This tutorial provides a comprehensive overview of the text-edit based approaches and current state-of-the-art models, analyzing the pros and cons of different methods. We discuss challenges related to productionization and how these models can to help mitigate hallucination and bias, both pressing challenges in the field of text generation. View details
    Preview abstract We propose a new model for grammatical error correction (GEC) which builds on a very large multilingual masked language model, covering 101 languages. To adapt our model for the GEC task, we design an unsupervised, language-agnostic pretraining objective that mimics corrections typically contained in labeled data. After finetuning on gold data, we surpass the previous state-of-the-art results on the four evaluated languages (Czech, English, German and Russian). This approach shows the power of large multilingual language models. Due to these models being non-trivial to run on non-cluster infrastructure, we employ our model to clean up the labels in the popular yet noisy Lang-8 dataset. We release this dataset and hope that the community will find it useful for further advancement of GEC. View details
    Preview abstract We propose MASKER, an unsupervised text-editing method for style transfer. To tackle cases when no parallel source–target pairs are available, we train masked language models (MLMs) for both the source and the target domain. Then we find the text spans where the two models disagree the most in terms of likelihood. This allows us to identify the source tokens to delete to transform the source text to match the style of the target domain. The deleted tokens are replaced with the target MLM, and by using a padded MLM variant, we avoid having to predetermine the number of inserted tokens. Our experiments on sentence fusion and sentiment transfer demonstrate that MASKER performs competitively in a fully unsupervised setting. Moreover, in low-resource settings, it improves supervised methods’ accuracy by over 10 percentage points when pre-training them on silver training data generated by MASKER. View details
    Preview abstract We present FELIX --- a flexible text-editing approach for generation, designed to derive maximum benefit from the ideas of decoding with bi-directional contexts and self-supervised pre-training. In contrast to conventional sequence-to-sequence (seq2seq) models, FELIX is efficient in low-resource settings and fast at inference time, while being capable of modeling flexible input-output transformations. We achieve this by decomposing the text-editing task into two sub-tasks: tagging to decide on the subset of input tokens and their order in the output text and insertion to in-fill the missing tokens in the output not present in the input. The tagging model employs a novel Pointer mechanism, while the insertion model is based on a Masked Language Model. Both of these models are chosen to be non-autoregressive to guarantee faster inference. FELIX performs favourably when compared to recent text-editing methods and strong seq2seq baselines when evaluated on four NLG tasks: Sentence Fusion, Machine Translation Automatic Post-Editing, Summarization, and Text Simplification. View details
    Leveraging Pre-trained Checkpoints for Sequence Generation Tasks
    Transactions of the Association for Computational Linguistics, 8 (2020), pp. 264-280
    Preview abstract Pre-training Neural Networks have become widely successful in Natural Language Processing. Training these large models on unsupervised data is costly and often not feasible. We therefore concentrate on publicly available checkpoints. While most of them improve the Natural Language Understanding, we investigate initializing Transformer-based Sequence-to-sequence models with these pre-trained models for Natural Language Understanding and Generation. Using these pre-trained models we achieve new state-of-the-art results on Machine translation, Summarization and Sentence Splitting/Fusion. View details
    Preview abstract The softmax function on top of a final linear layer is the de facto method to output probability distributions in neural networks. In many applications such as language models or text generation, these models have to produce distributions over large output vocabularies. Recently, this has been shown to have limited representational capacity due to its connection with the rank bottleneck in matrix factorization. However, little is known about the limitations of linear-softmax for quantities of practical interest such as cross entropy or mode estimation, direction theoretically and empirically explored in this paper. As an efficient and effective solution to alleviate this issue, we propose to learn parametric monotonic functions on top of the logits. Theoretically, we show that such monotonic functions are likely to increase the rank of a matrix to its full rank. Empirically, our method improves over the traditional softmax-linear layer both in synthetic and real language model experiments with negligible time or memory overhead, while being comparable to the more computationally expensive mixture of softmax distributions. View details
    Using Audio Transformations to Improve Comprehension in Voice Question Answering
    Johanne R. Trippas
    Hanna Silen
    Damiano Spina
    Crestani F. et al. (eds) Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2019, Springer, Cham, pp. 164-170
    Preview abstract Many popular form factors of digital assistants—such as Amazon Echo, Apple Homepod, or Google Home—enable the user to hold a conversation with these systems based only on the speech modality. The lack of a screen presents unique challenges. To satisfy the information need of a user, the presentation of the answer needs to be optimized for such voice-only interactions. In this paper, we propose a task of evaluating the usefulness of audio transformations (i.e., prosodic modifications) for voice-only question answering. We introduce a crowdsourcing setup where we evaluate the quality of our proposed modifications along multiple dimensions corresponding to the informativeness, naturalness, and ability of the user to identify key parts of the answer. We offer a set of prosodic modifications that highlight potentially important parts of the answer using various acoustic cues. Our experiments show that some of these modifications lead to better comprehension at the expense of only slightly degraded naturalness of the audio. View details
    Preview abstract We propose LaserTagger - a sequence tagging approach that casts text generation as a text editing task. Target texts are reconstructed from the inputs using three main edit operations: keeping a token, deleting it, and adding a phrase before the token. To predict the edit operations, we propose a novel model, which combines a BERT encoder with an autoregressive Transformer decoder. This approach is evaluated on English text on four tasks: sentence fusion, sentence splitting, abstractive summarization, and grammar correction. LaserTagger achieves new state-of-the-art results on three of these tasks, performs comparably to a set of strong seq2seq baselines with a large number of training examples, and outperforms them when the number of examples is limited. Furthermore, we show that at inference time tagging can be more than two orders of magnitude faster than comparable seq2seq models, making it more attractive for running in a live environment. View details
    Preview abstract In this paper we study various flavors of variational autoencoders and address the methodological issues with the current neural text generation research and also close some gaps by answering a few natural questions to the studies already published. View details
    Preview abstract Making use of weak or noisy signals, like the output of heuristic methods or user click through data for training deep neural networks is increasing, in particular for the tasks where an adequate amount of data with true labels is not available. In a semi-supervised setting, we can use a large set of data with weak labels to pretrain a neural network and fine tune the parameters with a small amount of data with true labels. However, these two independent stages do not leverage the full capacity of clean information from true labels during pretraining. In this paper, we propose a semi-supervised learning method where we train two neural networks in a multi-task fashion: a target network and a confidence network. The target network is optimized to perform a given task and is trained using a large set of unlabeled data that are weakly annotated. We propose to weight the gradient updates to the target network using the scores provided by the second confidence network, which is trained on a small amount of supervised data. Thus we avoid that the weight updates computed from noisy labels harm the quality of the target network model. We evaluate our learning strategy on two different tasks: document ranking and sentiment classification. The results demonstrate that our approach not only enhances the performance compared to the baselines but also speeds up the learning process from weak labels. View details