Julie Cattiau
Research Areas
Authored Publications
Sort By
A Convolutional Neural Network for Automated Detection of Humpback Whale Song in a Diverse, Long-Term Passive Acoustic Dataset
Ann N. Allen
Matt Harvey
Karlina P. Merkens
Carrie C. Wall
Erin M. Oleson
Frontiers in Marine Science, 8 (2021), pp. 165
Preview abstract
Passive acoustic monitoring is a well-established tool for researching the occurrence, movements, and ecology of a wide variety of marine mammal species. Advances in hardware and data collection have exponentially increased the volumes of passive acoustic data collected, such that discoveries are now limited by the time required to analyze rather than collect the data. In order to address this limitation, we trained a deep convolutional neural network (CNN) to identify humpback whale song in over 187,000 h of acoustic data collected at 13 different monitoring sites in the North Pacific over a 14-year period. The model successfully detected 75 s audio segments containing humpback song with an average precision of 0.97 and average area under the receiver operating characteristic curve (AUC-ROC) of 0.992. The model output was used to analyze spatial and temporal patterns of humpback song, corroborating known seasonal patterns in the Hawaiian and Mariana Islands, including occurrence at remote monitoring sites beyond well-studied aggregations, as well as novel discovery of humpback whale song at Kingman Reef, at 5∘ North latitude. This study demonstrates the ability of a CNN trained on a small dataset to generalize well to a highly variable signal type across a diverse range of recording and noise conditions. We demonstrate the utility of active learning approaches for creating high-quality models in specialized domains where annotations are rare. These results validate the feasibility of applying deep learning models to identify highly variable signals across broad spatial and temporal scales, enabling new discoveries through combining large datasets with cutting edge tools.
View details
Automatic Speech Recognition of Disordered Speech: Personalized models outperforming human listeners on short phrases
Jordan R. Green
Bob MacDonald
Rus Heywood
Richard Cave
Katie Seaver
Marilyn Ladewig
Interspeech (2021) (to appear)
Preview abstract
Objective. This study aimed to (1) evaluate the performance of personalized Automatic Speech Recognition (ASR) models on disordered speech samples representing a wide range of etiologies and speech severities, and (2) compare the accuracy of these models to that of speaker-independent ASR models developed on and for typical speech as well as expert human listeners. Methods. 432 individuals with self-reported disordered speech recorded at least 300 short phrases using a web-based application. Word error rates (WER) were computed using three different ASR models and expert human transcribers. Metadata were collected to evaluate the potential impact of participant, atypical speech, and technical factors on recognition accuracy. Results. The accuracy of personalized models for recognizing disordered speech was high (WER: 4.6%), and significantly better than speaker-independent models (WER: 31%). Personalized models also outperformed human transcribers (WER gain: 9%) with relative gains in accuracy as high as 80%. The most significant gain in recognition performance was for the most severely affected speakers. Low SNR and fewer training utterances adversely affected recognition even for speakers with mild speech impairments. Conclusions. Personalized ASR models have significant potential for improving communication for persons with impaired speech.
View details
Disordered Speech Data Collection: Lessons Learned at 1 Million Utterances from Project Euphonia
Bob MacDonald
Rus Heywood
Richard Cave
Katie Seaver
Marilyn Ladewig
Jordan R. Green
Interspeech (2021) (to appear)
Preview abstract
Speech samples from over 1000 individuals with impaired speech have been submitted for Project Euphonia, aimed at improving automated speech recognition for atypical speech. We provide an update on the contents of the corpus, which recently passed 1 million utterances, and review key lessons learned from this project.
The reasoning behind decisions such as phrase set composition, prompted vs extemporaneous speech, metadata and data quality efforts are explained based on findings from both technical and user-facing research.
View details
Personalizing ASR for Dysarthric and Accented Speech with Limited Data
Joel Shor
Interspeech 2019 (2019)
Preview abstract
Automatic speech recognition (ASR) systems have dramatically
improved over the last few years. ASR systems are most often trained from ‘typical’ speech, which means that underrepresented groups don’t experience the same level of improvement.
In this paper, we present and evaluate finetuning techniques to
improve ASR for users with non standard speech. We focus
on two types of non standard speech: speech from people with
amyotrophic lateral sclerosis (ALS) and accented speech. We
train personalized models that achieve 62% and 35% relative
WER improvement on these two groups, bringing the absolute
WER for ALS speakers, on a test set of message bank phrases,
to 10% for mild dysarthria and 20% for more serious dysarthria.
We show that 76% of the improvement comes from only 5 min
of training data. Finetuning a particular subset of layers (with
many fewer parameters) often gives better results than finetuning the entire model. This is the first step towards building state
of the art ASR models for dysarthric speech
Index Terms: speech recognition, personalization, accessibility
View details