Semi-Supervision in ASR: Sequential Mixmatch and Factorized TTS-Based Augmentation

Zhehuai Chen

Andrew Rosenberg

Yu Zhang

Heiga Zen (Byungha Chun)

Mohammadreza Ghodsi

Yinghui Huang

Jesse Emond

Gary Wang

Bhuvana Ramabhadran

Pedro Jose Moreno Mengibar

(2021)

Download Google Scholar

Abstract

Semi- and self-supervised training techniques have the potential to improve performance of speech recognition systems without additional transcribed speech data. In this work, we demonstrate the efficacy of two approaches to semi-supervision for automated speech recognition. The two approaches lever-age vast amounts of available unspoken text and untranscribed audio. First, we present factorized multilingual speech synthesis to improve data augmentation on unspoken text. Next, we present an online implementation of Noisy Student Training to incorporate untranscribed audio. We propose a modified Sequential MixMatch algorithm with iterative learning to learn from untranscribed speech. We demonstrate the compatibility of these techniques yielding a relative reduction of word error rate of up to 14.4% on the voice search task.

Research Areas

Speech Processing

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Semi-Supervision in ASR: Sequential Mixmatch and Factorized TTS-Based Augmentation

Abstract

Research Areas

Meet the teams driving innovation

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Semi-Supervision in ASR: Sequential Mixmatch and Factorized TTS-Based Augmentation

Abstract

Research Areas

Meet the teams driving innovation

AI/ML Foundations  & Capabilities