Leveraging Pre-trained Checkpoints for Sequence Generation Tasks

Sascha Rothe

Shashi Narayan

Aliaksei Severyn

Transactions of the Association for Computational Linguistics, 8(2020), pp. 264-280

Download Google Scholar

Abstract

Pre-training Neural Networks have become widely successful in Natural Language Processing. Training these large models on unsupervised data is costly and often not feasible. We therefore concentrate on publicly available checkpoints. While most of them improve the Natural Language Understanding, we investigate initializing Transformer-based Sequence-to-sequence models with these pre-trained models for Natural Language Understanding and Generation. Using these pre-trained models we achieve new state-of-the-art results on Machine translation, Summarization and Sentence Splitting/Fusion.

Research Areas

Machine Intelligence
Machine Translation

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Leveraging Pre-trained Checkpoints for Sequence Generation Tasks

Abstract

Research Areas

Meet the teams driving innovation

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Leveraging Pre-trained Checkpoints for Sequence Generation Tasks

Abstract

Research Areas

Meet the teams driving innovation

AI/ML Foundations  & Capabilities