Daan van Esch

I work on internationalization for language technology at Google, harnessing machine learning and scalable infrastructure to bring support for new languages to products like Gboard and the Assistant. Our world has a wealth of linguistic diversity and it's a fascinating research challenge to build technology across so many different languages.

Research Areas

Authored Publications

Google Publications

Other Publications

Now You See Me, Now You Don't: 'Poverty of the Stimulus' Problems and Arbitrary Correspondences in End-to-End Speech Models

Daan van Esch

Proceedings of the Second Workshop on Computation and Written Language (CAWL) 2024 (to appear)

LinguaMeta: Unified Metadata for Thousands of Languages

Sandy Ritchie

Daan van Esch

Uche Okonkwo

Shikhar Vashishth

Emily Drummond

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (to appear)

Connecting Language Technologies with Rich, Diverse Data Sources Covering Thousands of Languages

Daan van Esch

Sandy Ritchie

Sebastian Ruder

Julia Kreutzer

Clara Rivera

Ishank Saxena

Isaac Caswell

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (to appear)

Multimodal Language Identification

Shikhar Bharadwaj

Min Ma

Shikhar Vashishth

Ankur Bapna

Sriram (Sri) Ganapathy

Vera Axelrod

Sid Dalmia

Wei Han

Yu Zhang

Daan van Esch

Sandy Ritchie

Partha Talukdar

Jason Riesa

Proceedings of 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2024) (2024)

Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets

Julia Kreutzer

Isaac Caswell

Lisa Wang

Ahsan Wahab

Daan van Esch

Nasanbayar Ulzii-Orshikh

Allahsera Auguste Tapo

Nishant Subramani

Artem Sokolov

Claytone Sikasote

Monang Setyawan

Supheakmungkol Sarin

Sokhar Samb

Benoît Sagot

Clara E. Rivera

Annette Rios

Isabel Papadimitriou

Salomey Osei

Pedro Javier Ortiz Suárez

Iroro Fred Ọ̀nọ̀mẹ̀ Orife

Kelechi Ogueji

Rubungo Andre Niyongabo

Toan Nguyen

Mathias Müller

André Müller

Shamsuddeen Hassan Muhammad

Nanda Muhammad

Ayanda Mnyakeni

Jamshidbek Mirzakhalov

Tapiwanashe Matangira

Colin Leong

Nze Lawson

Sneha Kudugunta

Yacine Jernite

Mathias Jenny

Orhan Firat

Bonaventure F. P. Dossou

Sakhile Dlamini

Nisansa de Silva

Sakine Çabuk Ballı

Stella Biderman

Alessia Battisti

Ahmed Baruwa

Ankur Bapna

Pallavi Baljekar

Israel Abebe Azime

Ayodele Awokoya

Duygu Ataman

Orevaoghene Ahia

Oghenefego Ahia

Sweta Agrawal

Mofetoluwa Adeyemi

TACL (2022)

XTREME-S: Evaluating Cross-lingual Speech Representations

Ankur Bapna

Clara E. Rivera

Daan van Esch

Jason Riesa

Jon Clark

Melvin Johnson

Mihir Sanjay Kale

Min Ma

Orhan Firat

Sandy Ritchie

Sebastian Ruder

Simran Khanuja

Ye Jia

Yu Zhang

Proc. Interspeech 2022

Writing System and Speaker Metadata for 2,800+ Language Varieties

Daan van Esch

Tamar Lucassen

Sebastian Ruder

Isaac Caswell

Clara E. Rivera

Proceedings of the Language Resources and Evaluation Conference, European Language Resources Association, Marseille, France (2022), pp. 5035-5046

Handling Compounding in Mobile Keyboard Input

Andreas Christian Kabel

Keith B. Hall

Tom Ouyang

David Rybach

Daan van Esch

Françoise Simone Beaufays

arXiv cs.CL (2022)

Managing Transcription Data for Automatic Speech Recognition with Elpis

Ben Foley

Daan van Esch

Nay San

The Open Handbook of Linguistic Data Management, The MIT Press (2022)

Building Machine Translation Systems for the Next Thousand Languages

Ankur Bapna

Isaac Caswell

Julia Kreutzer

Orhan Firat

Daan van Esch

Aditya Siddhant

Mengmeng Niu

Pallavi Nikhil Baljekar

Xavier Garcia

Wolfgang Macherey

Theresa Breiner

Vera Saldinger Axelrod

Jason Riesa

Yuan Cao

Mia Chen

Klaus Macherey

Maxim Krikun

Pidong Wang

Alexander Gutkin

Apu Shah

Yanping Huang

Zhifeng Chen

Yonghui Wu

Macduff Richard Hughes

Google Research (2022)

Accented Speech Recognition: Benchmarking, Pre-training, and Diverse Data

Alëna Aksënova

Zhehuai Chen

Chung-Cheng Chiu

Daan van Esch

Pavel Golik

Wei Han

Levi King

Bhuvana Ramabhadran

Andrew Rosenberg

Suzan Schwartz

Gary Wang

(2022)

Large vocabulary speech recognition for languages of Africa: multilingual modeling and self-supervised learning

Sandy Ritchie

You-Chi Cheng

Mingqing Chen

Rajiv Mathews

Daan van Esch

Bo Li

Khe Chai Sim

(2022)

How Might We Create Better Benchmarks for Speech Recognition?

Alëna Aksënova

Daan van Esch

James Flynn

Pavel Golik

ACL-IJCNLP 2021 Workshop on Benchmarking: Past, Present and Future (2021)

A Large Scale Low-Resource Pronunciation Data Set Mined From Wikipedia

Tania Chakraborty

Manasa Prasad

Theresa Breiner

Sandy Ritchie

Daan van Esch

arXiv cs.CL (2021)

Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus

Isaac Caswell

Theresa Breiner

Daan van Esch

Ankur Bapna

COLING (2020)

Data-Driven Parametric Text Normalization: Rapidly Scaling Finite-State Transduction Verbalizers to New Languages

Sandy Ritchie

Eoin Mahon

Kim Anne Heiligenstein

Nikos Bampounis

Daan van Esch

Christian Schallhart

Jonas Fromseier Mortensen

Benoit Brard

Proceedings of the 1st Joint SLTU and CCURL Workshop (SLTU-CCURL 2020), Language Resources and Evaluation Conference (LREC 2020), Marseille, 218–225

Future Directions in Technological Support for Language Documentation

Daan van Esch

Ben Foley

Nay San

Proceedings of the 3rd Workshop on the Use of Computational Methods in the Study of Endangered Languages (ComputEL-3) (2019)

Unified Verbalization for Speech Recognition & Synthesis Across Languages

Sandy Ritchie

Richard Sproat

Kyle Gorman

Daan van Esch

Christian Schallhart

Nikos Bampounis

Benoit Brard

Jonas Fromseier Mortensen

Millie Holt

Eoin Mahon

Proceedings of Interspeech 2019

Automatic Keyboard Layout Design for Low-Resource Latin-Script Languages

Theresa Breiner

Chieu Nguyen

Daan van Esch

Jeremy O'Brien

arXiv cs.CL (2019)

Writing Across the World's Languages: Deep Internationalization for Gboard, the Google Keyboard

Daan van Esch

Elnaz Sarbar

Tamar Lucassen

Jeremy O'Brien

Theresa Breiner

Manasa Prasad

Evan Elizabeth Crew

Chieu Nguyen

Francoise Beaufays

arXiv cs.HC (2019)

Developing Pronunciation Models in New Languages Faster by Exploiting Common Grapheme-to-Phoneme Correspondences Across Languages

Harry Bleyan

Sandy Ritchie

Jonas Fromseier Mortensen

Daan van Esch

Proceedings of Interspeech 2019

Building Large-Vocabulary ASR Systems for Languages Without Any Audio Training Data

Manasa Prasad

Daan van Esch

Sandy Ritchie

Jonas Fromseier Mortensen

Proceedings of Interspeech 2019

Text Normalization Infrastructure that Scales to Hundreds of Language Varieties

Mason Chua

Daan van Esch

Noah Coccaro

Eunjoon Cho

Sujeet Bhandari

Libin Jia

Proceedings of the 11th edition of the Language Resources and Evaluation Conference (2018)

Building Speech Recognition Systems for Language Documentation: The CoEDL Endangered Language Pipeline and Inference System

Ben Foley

Josh Arnold

Rolando Coto-Solano

Gautier Durantin

T. Mark Ellison

Daan van Esch

Scott Heath

František Kratochvíl

Zara Maxwell-Smith

David Nash

Ola Olsson

Mark Richards

Nay San

Hywel Stoakes

Nick Thieberger

Janet Wiles

Proceedings of the 6th International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU 2018)

Mining Training Data for Language Modeling across the World’s Languages

Manasa Prasad

Theresa Breiner

Daan van Esch

Proceedings of the 6th International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU 2018)

An Expanded Taxonomy of Semiotic Classes for Text Normalization

Daan van Esch

Richard Sproat

Proceedings of Interspeech 2017

Predicting Pronunciations with Syllabification and Stress with Recurrent Neural Networks

Daan van Esch

Mason Chua

Kanishka Rao

Proceedings of Interspeech 2016

No Results Found

Search on Google Scholar

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Daan van Esch

Research Areas

Join us

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Daan van Esch

Research Areas

Filter by:

Year

Team

Research Area

Join us

AI/ML Foundations  & Capabilities