Google Crowdsourced Speech Corpora and Related Open-Source Resources for Low-Resource Languages and Dialects: An Overview

Alena Butryna; Shan Hui Cathy Chu; Isin Demirsahin; Alexander Gutkin; Linne Ha; Fei He; Martin Jansche; Cibu C Johny; Anna Katanova; Oddur Kjartansson; Chen Fang Li; Tatiana Merkulova; Yin May Oo; Knot Pipatsrisawat; Clara E. Rivera; Supheakmungkol Sarin; Pasindu De Silva; Keshan Sodimana; Richard Sproat; Theeraphol Wattanavekin; Jaka Aris Eko Wibawa

Google Crowdsourced Speech Corpora and Related Open-Source Resources for Low-Resource Languages and Dialects: An Overview

Alena Butryna

Shan Hui Cathy Chu

Isin Demirsahin

Alexander Gutkin

Linne Ha

Fei He

Martin Jansche

Cibu C Johny

Anna Katanova

Oddur Kjartansson

Chen Fang Li

Tatiana Merkulova

Yin May Oo

Knot Pipatsrisawat

Clara E. Rivera

Supheakmungkol Sarin

Pasindu De Silva

Keshan Sodimana

Richard Sproat

Theeraphol Wattanavekin

Jaka Aris Eko Wibawa

2019 UNESCO International Conference Language Technologies for All (LT4All): Enabling Linguistic Diversity and Multilingualism Worldwide, 4--6 December, Paris, France, pp. 91-94

Download Google Scholar

Abstract

This paper presents an overview of a program designed to address the growing need for developing free speech resources for under-represented languages. At present we have released 38 datasets for building text-to-speech and automatic speech recognition applications for languages and dialects of South and Southeast Asia, Africa, Europe and South America. The paper describes the methodology used for developing such corpora and presents some of our findings that could benefit under-represented language community.

Research Areas

Natural language processing

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Google Crowdsourced Speech Corpora and Related Open-Source Resources for Low-Resource Languages and Dialects: An Overview

Abstract

Research Areas

Meet the teams driving innovation

Google AI

Google Cloud

Google DeepMind

Google Labs