Training Deeper Neural Machine Translation Models with Transparent Attention

Ankur Bapna; Mia Chen; Orhan Firat; Yuan Cao; Yonghui Wu

Training Deeper Neural Machine Translation Models with Transparent Attention

Ankur Bapna

Mia Chen

Orhan Firat

Yuan Cao

Yonghui Wu

EMNLP (2018)

Download Google Scholar

Abstract

While current state-of-the-art NMT models, both LSTM based and Transformers, are much deeper compared to their early counterparts, they are still shallow in comparison to convolutional models used for both text and vision applications. In this work we attempt to train significantly (2-3x) deeper transformer and BiLSTM encoders for machine translation. We propose a simple modification to the attention mechanism that eases the optimization of deeper models, and results in significant improvements on the benchmark WMT'14 English-German and WMT'15 Czech-English tasks for both architectures.

Research Areas

Machine intelligence

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Training Deeper Neural Machine Translation Models with Transparent Attention

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs