Google Research

Extending Parrotron: An End-to-End, Speech Conversion and Speech Recognition Model for Atypical Speech

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Abstract

We present an extended Parrotron model: a single, end-to-end model that enables voice conversion and recognition simultaneously. Input spectrograms are transformed to output spectrograms in the voice of a predetermined target speaker while also generating hypotheses in the target vocabulary. We study the performance of this novel architecture that jointly predicts speech and text on atypical (‘dysarthric’) speech. We show that with as little as an hour of atypical speech, speaker adaptation can yield up to 67% relative reduction in Word Error Rate (WER). We also show that data augmentation using a customized synthesizer built on the atypical speech can provide an additional 10% relative improvement over the best speaker-adapted model. Finally, we show that these methods generalize across 8 dysarthria etiologies with a range of severities.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work