Characterizing Dysarthria Diversity for Automatic Speech Recognition: A Tutorial From the Clinical Perspective
Abstract
Despite significant advancements in automatic speech recognition (ASR) technology, even the best performing ASR systems are inadequate for speakers with impaired speech. This inadequacy may be, in part, due to the challenges associated with acquiring a sufficiently diverse training sample of disordered speech. Speakers with dysarthria, which refers to a group of divergent speech disorders secondary to neurologic injury, exhibit highly variable speech patterns both within and across individuals. This diversity is currently poorly characterized and, consequently, difficult to adequately represent in disordered speech ASR corpora. In this article, we consider the variable expressions of dysarthria within the context of established clinical taxonomies (e.g., Darley, Aronson, and Brown dysarthria subtypes). We also briefly consider past and recent efforts to capture this diversity quantitatively using speech analytics. Understanding dysarthria diversity from the clinical perspective and how this diversity may impact ASR performance could aid in (1) optimizing data collection strategies for minimizing bias; (2) ensuring representative ASR training sets; and (3) improving generalization of ASR for difficult-to-recognize speakers. Our overarching goal is to facilitate the development of robust ASR systems for dysarthric speech using clinical knowledge.