Sequence Transduction Using Span-level Edit Operations

Felix Stahlberg; Shankar Kumar

Sequence Transduction Using Span-level Edit Operations

Felix Stahlberg

Shankar Kumar

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, pp. 5147-5159

Download Google Scholar

Abstract

We propose an open-vocabulary approach to sequence editing for natural language processing (NLP) tasks with a high degree of overlap between input and output texts. We represent sequence-to-sequence transduction as a sequence of edit operations, where each operation either replaces an entire source span with target tokens or keeps it unchanged. We test our method on five NLP tasks (text normalization, sentence fusion, sentence splitting and rephrasing, text simplification, and grammatical error correction) and report competitive results across the board. We show that our method has clear speed advantages over full sequence models for grammatical error correction because inference time depends on the number of edits rather than the number of target tokens. For text normalization, sentence fusion, and grammatical error correction, we associate each edit operation with a task-specific tag to improve explainability.

Research Areas

Natural language processing

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Sequence Transduction Using Span-level Edit Operations

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs