- Ankur Bapna
- Clara E. Rivera
- Daan van Esch
- Jason Riesa
- Jon Clark
- Melvin Johnson
- Mihir Sanjay Kale
- Min Ma
- Orhan Firat
- Sandy Ritchie
- Sebastian Ruder
- Simran Khanuja
- Ye Jia
- Yu Zhang
Abstract
We introduce \xtremes, a new benchmark to evaluate universal cross-lingual speech representations in many languages. XTREME-S covers four task families: speech recognition, classification, retrieval and speech-to-text translation. Covering 102 languages from 10+ language families, 3 different domains and 4 task families, XTREME-S aims to simplify multilingual speech representation evaluation, as well as catalyze research in ``universal'' speech representation learning. This paper describes the new benchmark and establishes the first speech-only and speech-text baselines using XLS-R and mSLAM on all downstream tasks. We motivate the design choices and detail how to use the benchmark. The code and pre-processing scripts will be made publicly available.\footnote{\small\url{https://huggingface.co/datasets/google/xtreme_s}}
Research Areas
Learn more about how we do research
We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work