Disordered Speech Data Collection: Lessons Learned at 1 Million Utterances from Project Euphonia

Bob MacDonald

Pan-Pan Jiang

Julie Cattiau

Rus Heywood

Richard Cave

Katie Seaver

Marilyn Ladewig

Jimmy Tobin

Michael Brenner

Philip Q Nelson

Jordan R. Green

Katrin Tomanek

Interspeech (2021) (to appear)

Google Scholar

Abstract

Speech samples from over 1000 individuals with impaired speech have been submitted for Project Euphonia, aimed at improving automated speech recognition for atypical speech. We provide an update on the contents of the corpus, which recently passed 1 million utterances, and review key lessons learned from this project. The reasoning behind decisions such as phrase set composition, prompted vs extemporaneous speech, metadata and data quality efforts are explained based on findings from both technical and user-facing research.

Research Areas

Human-Computer Interaction and Visualization
Responsible AI

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations  & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Disordered Speech Data Collection: Lessons Learned at 1 Million Utterances from Project Euphonia

Abstract

Research Areas

Learn more about how we conduct our research

Defining the technology of today and tomorrow.

Philosophy

People

Teams

AI/ML Foundations & Capabilities

Algorithms & Optimization

Computing Paradigms

Responsible Human-Centric Technology

Science & Societal Impact

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Disordered Speech Data Collection: Lessons Learned at 1 Million Utterances from Project Euphonia

Abstract

Research Areas

Learn more about how we conduct our research

AI/ML Foundations  & Capabilities