Google Research

Training Deeper Neural Machine Translation Models with Transparent Attention

EMNLP (2018)

Abstract

While current state-of-the-art NMT models, both LSTM based and Transformers, are much deeper compared to their early counterparts, they are still shallow in comparison to convolutional models used for both text and vision applications. In this work we attempt to train significantly (2-3x) deeper transformer and BiLSTM encoders for machine translation. We propose a simple modification to the attention mechanism that eases the optimization of deeper models, and results in significant improvements on the benchmark WMT'14 English-German and WMT'15 Czech-English tasks for both architectures.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work