Pan-Pan Jiang
Pan-Pan Jiang is a technical program manager, with a focus on accessibility and health projects. She previously worked at Verily, 23andMe, and the Broad Institute in basic and applied genomics research. She holds a B.Sc. in Biology from Queen's University, Canada; and PhD in Evolutionary Biology from Harvard University.
Research Areas
Authored Publications
Sort By
Automatic Speech Recognition of Conversational Speech in Individuals with Disordered Speech
Bob MacDonald
Rus Heywood
Richard Cave
Katie Seaver
Antoine Desjardins
Jordan Green
Journal of Speech, Language, and Hearing Research (2024) (to appear)
Preview abstract
Purpose: This study examines the effectiveness of automatic speech recognition (ASR) for individuals with speech disorders, addressing the gap in performance between read and conversational ASR. We analyze the factors influencing this disparity and the effect of speech mode-specific training on ASR accuracy.
Method: Recordings of read and conversational speech from 27 individuals with various speech disorders were analyzed using both (1) one speaker-independent ASR system trained and optimized for typical speech and (2) multiple ASR models that were personalized to the speech of the participants with disordered speech. Word Error Rates (WERs) were calculated for each speech mode, read vs conversational, and subject. Linear mixed-effect models were used to assess the impact of speech mode and disorder severity on ASR accuracy. We investigated nine variables, classified as technical, linguistic, or speech impairment factors, for their potential influence on the performance gap.
Results: We found a significant performance gap between read and conversational speech in both personalized and unadapted ASR models. Speech impairment severity notably impacted recognition accuracy in unadapted models for both speech modes and in personalized models for read speech. Linguistic attributes of utterances were the most influential on accuracy, though atypical speech characteristics also played a role. Including conversational speech samples in model training notably improved recognition accuracy.
Conclusions: We observed a significant performance gap in ASR accuracy between read and conversational speech for individuals with speech disorders. This gap was largely due to the linguistic complexity and unique characteristics of speech disorders in conversational speech. Training personalized ASR models using conversational speech significantly improved recognition accuracy, demonstrating the importance of domain-specific training and highlighting the need for further research into ASR systems capable of handling disordered conversational speech effectively.
View details
Speech Intelligibility Classifiers from 550k Disordered Speech Samples
Katie Seaver
Richard Cave
Neil Zeghidour
Rus Heywood
Jordan Green
ICASSP, Icassp submission. 2022 (2023)
Preview abstract
We developed dysarthric speech intelligibility classifiers on 551,176 disordered speech samples contributed by a diverse set of 468 speakers, with a range of self-reported speaking disorders and rated for their overall intelligibility on a fivepoint scale. We trained three models following different deep learning approaches and evaluated them on ∼94K utterances from 100 speakers. We further found the models to generalize well (without further training) on the TORGO database (100% accuracy), UASpeech (0.93 correlation), ALS-TDI PMP (0.81 AUC) datasets as well as on a dataset of realistic unprompted speech we gathered (106 dysarthric and 76 control speakers, ∼2300 samples).
View details
Preview abstract
Although personalized automatic speech recognition (ASR) models have recently been improved to recognize even severely impaired speech, model performance may degrade over time for persons with degenerating speech. The aims of this study were to (1) analyze the change of performance of ASR over time in individuals with degrading speech, and (2) explore mitigation strategies to optimize recognition throughout disease progression. Speech was recorded by four individuals with degrading speech due to amyotrophic lateral sclerosis (ALS). Word error rates (WER) across recording sessions were computed for three ASR models: Unadapted Speaker Independent (U-SI), Adapted Speaker Independent (A-SI), and Adapted Speaker Dependent (A-SD or personalized). The performance of all models degraded significantly over time as speech became more impaired, but the A-SD model improved markedly when updated with recordings from the severe stages of speech progression. Recording additional utterances early in the disease before significant speech degradation did not improve the performance of A-SD models. This emphasizes the importance of continuous recording (and model retraining) when providing personalized models for individuals with progressive speech impairments.
View details
Automatic Speech Recognition of Disordered Speech: Personalized models outperforming human listeners on short phrases
Jordan R. Green
Bob MacDonald
Rus Heywood
Richard Cave
Katie Seaver
Marilyn Ladewig
Interspeech (2021) (to appear)
Preview abstract
Objective. This study aimed to (1) evaluate the performance of personalized Automatic Speech Recognition (ASR) models on disordered speech samples representing a wide range of etiologies and speech severities, and (2) compare the accuracy of these models to that of speaker-independent ASR models developed on and for typical speech as well as expert human listeners. Methods. 432 individuals with self-reported disordered speech recorded at least 300 short phrases using a web-based application. Word error rates (WER) were computed using three different ASR models and expert human transcribers. Metadata were collected to evaluate the potential impact of participant, atypical speech, and technical factors on recognition accuracy. Results. The accuracy of personalized models for recognizing disordered speech was high (WER: 4.6%), and significantly better than speaker-independent models (WER: 31%). Personalized models also outperformed human transcribers (WER gain: 9%) with relative gains in accuracy as high as 80%. The most significant gain in recognition performance was for the most severely affected speakers. Low SNR and fewer training utterances adversely affected recognition even for speakers with mild speech impairments. Conclusions. Personalized ASR models have significant potential for improving communication for persons with impaired speech.
View details
Disordered Speech Data Collection: Lessons Learned at 1 Million Utterances from Project Euphonia
Bob MacDonald
Rus Heywood
Richard Cave
Katie Seaver
Marilyn Ladewig
Jordan R. Green
Interspeech (2021) (to appear)
Preview abstract
Speech samples from over 1000 individuals with impaired speech have been submitted for Project Euphonia, aimed at improving automated speech recognition for atypical speech. We provide an update on the contents of the corpus, which recently passed 1 million utterances, and review key lessons learned from this project.
The reasoning behind decisions such as phrase set composition, prompted vs extemporaneous speech, metadata and data quality efforts are explained based on findings from both technical and user-facing research.
View details
Evaluation of the Use of Combined Artificial Intelligence and Pathologist Assessment to Review and Grade Prostate Biopsies
Kunal Nagpal
Davis J. Foote
Adam Pearce
Samantha Winter
Matthew Symonds
Liron Yatziv
Trissia Brown
Isabelle Flament-Auvigne
Fraser Tan
Martin C. Stumpe
Cameron Chen
Craig Mermel
JAMA Network Open (2020)
Preview abstract
Importance: Expert-level artificial intelligence (AI) algorithms for prostate biopsy grading have recently been developed. However, the potential impact of integrating such algorithms into pathologist workflows remains largely unexplored.
Objective: To evaluate an expert-level AI-based assistive tool when used by pathologists for the grading of prostate biopsies.
Design, Setting, and Participants: This diagnostic study used a fully crossed multiple-reader, multiple-case design to evaluate an AI-based assistive tool for prostate biopsy grading. Retrospective grading of prostate core needle biopsies from 2 independent medical laboratories in the US was performed between October 2019 and January 2020. A total of 20 general pathologists reviewed 240 prostate core needle biopsies from 240 patients. Each pathologist was randomized to 1 of 2 study cohorts. The 2 cohorts reviewed every case in the opposite modality (with AI assistance vs without AI assistance) to each other, with the modality switching after every 10 cases. After a minimum 4-week washout period for each batch, the pathologists reviewed the cases for a second time using the opposite modality. The pathologist-provided grade group for each biopsy was compared with the majority opinion of urologic pathology subspecialists.
Exposure: An AI-based assistive tool for Gleason grading of prostate biopsies.
Main Outcomes and Measures: Agreement between pathologists and subspecialists with and without the use of an AI-based assistive tool for the grading of all prostate biopsies and Gleason grade group 1 biopsies.
Results: Biopsies from 240 patients (median age, 67 years; range, 39-91 years) with a median prostate-specific antigen level of 6.5 ng/mL (range, 0.6-97.0 ng/mL) were included in the analyses. Artificial intelligence–assisted review by pathologists was associated with a 5.6% increase (95% CI, 3.2%-7.9%; P < .001) in agreement with subspecialists (from 69.7% for unassisted reviews to 75.3% for assisted reviews) across all biopsies and a 6.2% increase (95% CI, 2.7%-9.8%; P = .001) in agreement with subspecialists (from 72.3% for unassisted reviews to 78.5% for assisted reviews) for grade group 1 biopsies. A secondary analysis indicated that AI assistance was also associated with improvements in tumor detection, mean review time, mean self-reported confidence, and interpathologist agreement.
Conclusions and Relevance: In this study, the use of an AI-based assistive tool for the review of prostate biopsies was associated with improvements in the quality, efficiency, and consistency of cancer detection and grading.
View details