Google Research

Participatory Research for Low-resourced Machine Translation: A Case Study in African Languages

  • ∀‎‎‏‏‎ ‎‏‏‎ ‎‏‏‎ ‎
  • Abdallah Bashir
  • Adewale Akinfaderin
  • Alp Öktem
  • Arshath Ramkilowan
  • Ayodele Olabiyi
  • Blessing Bassey
  • Blessing Sibanda
  • Bonaventure Dossou
  • Chris Emezue
  • Christopher Onyefuluchi
  • Daniel Whitenack
  • Elan van Biljon
  • Espoir Murhabazi
  • Ghollah Kioko
  • Goodness Duru
  • Hady Elsahar
  • Herman Kamper
  • Idris Abdulkadir Dangana
  • Ignatius Ezeani
  • Iroro Orife
  • Jade Abbott
  • Jamiil Toure Ali
  • Julia Kreutzer
  • Kathleen Siminyu
  • Kelechi Ogueji
  • Kevin Degila
  • Kolawole Tajudeen
  • Laura Jane Martinus
  • Lawrence Okegbemi
  • Masabata Mokgesi-Selinga
  • Mofetoluwa Adeyemi
  • Musie Meressa
  • Orevaoghene Ahia
  • Perez Ogayo
  • Ricky Macharm
  • Rubungo Andre Niyongabo
  • Sackey Freshia
  • Salomey Osei
  • Salomon Kabongo Kabenamualu
  • Shamsuddeen Muhammad
  • Solomon Oluwole Akinola
  • Taiwo Fagbohungbe
  • Timi Fasubaa
  • Tshinondiwa Matsila
  • Vukosi Marivate
  • Wilhelmina Nekoto
  • Wole Akin
Findings of the 2020 Conference on Empirical Methods in Natural Language Processing, Virtual (to appear)


Research in NLP lacks geographic diversity, and the question of how NLP can be scaled to low-resourced languages has not yet been adequately solved. "Low-resourced"-ness is a complex problem going beyond data availability and reflects systemic problems in society. In this paper, we focus on the task of Machine Translation (MT), that plays a crucial role for information accessibility and communication worldwide. Despite immense improvements in MT over the past decade, MT is centered around a few high-resourced languages. As MT researchers cannot solve the problem of low-resourcedness alone, we propose participatory research as a means to involve all necessary agents required in the MT development process. We demonstrate the feasibility and scalability of participatory research with a case study on MT for African languages. Its implementation leads to a collection of novel translation datasets, MT benchmarks for over 30 languages, with human evaluations for a third of them, and enables participants without formal training to make a unique scientific contribution. Benchmarks, models, data, code, and evaluation results are released on GitHub.

Learn more about how we do research

We maintain a portfolio of research projects, providing individuals and teams the freedom to emphasize specific types of work