Conformer: Convolution-augmented Transformer for Speech Recognition

Anmol Gulati

Chung-Cheng Chiu

James Qin

Jiahui Yu

Niki Parmar

Ruoming Pang

Shibo Wang

Wei Han

Yonghui Wu

Yu Zhang

Zhengdong Zhang

(2020) (to appear)

Google Scholar

Abstract

Recently end-to-end transformers and convolution neural networks have shown promising results in Automatic Speech Recognition (ASR), outperforming recurrent neural networks (RNNs). In this work, we study how to combine convolutions and transformers to model both global interactions and the local patterns of an audio sequence in a parameter-efficient way. We propose the convolution-augmented transformer for speech recognition, named \textit{Conformer}. \textit{Conformer} achieves state-of-the-art accuracies while being parameter-efficient, outperforming all previous models in ASR. On the widely used Librispeech benchmark, our model achieves WER of 2.1%/4.3% and 1.9%/3.9% with external language model. Our small sized model with 10M parameters achieves 2.7%/6.3%.

Research Areas

Speech Processing

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Conformer: Convolution-augmented Transformer for Speech Recognition

Abstract

Research Areas

Meet the teams driving innovation

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Conformer: Convolution-augmented Transformer for Speech Recognition

Abstract

Research Areas

Meet the teams driving innovation

AI/ML Foundations  & Capabilities