UserLibri: A Dataset for ASR Personalization with Only Text

Ehsan Variani; Khe Chai Sim; Kilol Gupta; Lara McConnaughey; Mingqing Chen; Rajiv Mathews; Shefali Garg; Swaroop Ramaswamy; Theresa Breiner

UserLibri: A Dataset for ASR Personalization with Only Text

Ehsan Variani

Khe Chai Sim

Kilol Gupta

Lara McConnaughey

Mingqing Chen

Rajiv Mathews

Shefali Garg

Swaroop Ramaswamy

Theresa Breiner

2022

Google Scholar

Abstract

Personalization of speech models on mobile devices (on-device personalization) is an active area of research, but more often than not, mobile devices have more text-only data than paired audio-text data. We explore training a personalized language model on text-only data, used during inference to improve speech recognition performance for that user. We experiment on a user-clustered LibriSpeech corpus, supplemented with personalized text-only data for each user from Project Gutenberg. We release this User-Specific LibriSpeech (UserLibri) dataset to aid future personalization research. LibriSpeech audio-transcript pairs are grouped into 55 users from the test-clean dataset and 52 users from test-other. We are able to lower the average word error rate per user across both sets in streaming and nonstreaming models, including an improvement of 2.5 for the harder set of test-other users when streaming.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

UserLibri: A Dataset for ASR Personalization with Only Text

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs