Jump to Content

A Voice-Activated Switch for Persons with Motor and Speech Impairments: Isolated-Vowel Spotting Using Neural Networks

Lisie Lillianfeld
Katie Seaver
Jordan R. Green
InterSpeech 2021 (2021)


Severe speech impairments limit the precision and range of producible speech sounds. As a result, generic automatic speech recognition (ASR) and keyword spotting (KWS) systems are unable to accurately recognize the utterances produced by individuals with severe speech impairments. This paper describes an approach in which simple speech sounds, namely isolated open vowels (e.g., /a/), are used in lieu of more motorically-demanding keywords. A neural network (NN) is trained to detect these isolated open vowels uttered by individuals with speech impairments against background noise. The NN is trained with a two-phase approach. The pre-training phase uses samples from unimpaired speakers along with samples of background noises and unrelated speech; then the fine-tuning stage uses samples of vowel samples collected from individuals with speech impairments. This model can be built into an experimental mobile app that allows users to activate preconfigured actions such as alerting caregivers. Preliminary user testing indicates the model has the potential to be a useful and flexible emergency communication channel for motor- and speech-impaired individuals.