Neural Machine Translation in Linear Time

Nal Kalchbrenner

Lasse Espeholt

Karen Simonyan

Aäron van den Oord

Alexander Graves

Koray Kavukcuoglu

Arxiv (2016)

Download Google Scholar

Abstract

We present a neural architecture for sequences, the ByteNet, that has two core features: it runs in time that is linear in the length of the sequences and it preserves the sequences' temporal resolution. The ByteNet is a stack of two dilated convolutional neural networks, one to encode the source and one to decode the target, where the target decoder unfolds dynamically to generate variable length outputs. We show that the ByteNet decoder attains state-of-the-art performance on character-level language modelling and outperforms recurrent neural networks. We also show that the ByteNet achieves a performance on raw character-level machine translation that approaches that of the best neural translation models that run in quadratic time. A visualization technique reveals the latent alignment structure learnt by the ByteNet.

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Neural Machine Translation in Linear Time

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Neural Machine Translation in Linear Time

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities