Jump to Content
Subhashini Venugopalan

Subhashini Venugopalan

I work on machine learning applications motivated in healthcare and sciences. Some of my work pertains to improving speech recognition systems for users with impaired speech, others to transfer learning for bio/medical data (e.g. detecting diabetic retinopathy, breast cancer), and I have also developed methods to interpret such vision/audio models (model explanation) for medical applications. During my graduate studies, I applied natural language processing and computer vision techniques to generate descriptions of events depicted in videos and images. I am a key contributor to a number of works featuring in the Healed through A.I. documentary. Please refer to my website (https://vsubhashini.github.io/) for more information and my Google Scholar page for an up-to-date list of my publications.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Saying more while typing less is the ideal we strive towards when designing assistive writing technology that can minimize effort. Complementary to efforts on predictive completions is the idea to use a drastically abbreviated version of an intended message, which can then be reconstructed using Language Models. This paper highlights the challenges that arise from investigating what makes an abbreviation scheme promising for a potential application. We hope that this can provide a guide for designing studies which consequently allow for fundamental insights on efficient and goal driven abbreviation strategies. View details
    Preview abstract We developed dysarthric speech intelligibility classifiers on 551,176 disordered speech samples contributed by a diverse set of 468 speakers, with a range of self-reported speaking disorders and rated for their overall intelligibility on a fivepoint scale. We trained three models following different deep learning approaches and evaluated them on ∼94K utterances from 100 speakers. We further found the models to generalize well (without further training) on the TORGO database (100% accuracy), UASpeech (0.93 correlation), ALS-TDI PMP (0.81 AUC) datasets as well as on a dataset of realistic unprompted speech we gathered (106 dysarthric and 76 control speakers, ∼2300 samples). View details
    SpeakFaster Observer: Long-Term Instrumentation of Eye-Gaze Typing for Measuring AAC Communication
    Richard Jonathan Noel Cave
    Bob MacDonald
    Jon Campbell
    Blair Casey
    Emily Kornman
    Daniel Vance
    Jay Beavers
    CHI23 Case Studies of HCI in Practice (2023) (to appear)
    Preview abstract Accelerating communication for users with severe motor and speech impairments, in particular for eye-gaze Augmentative and Alternative Communication (AAC) device users, is a long-standing area of research. However, observation of such users' communication over extended durations has been limited. This case study presents the real-world experience of developing and field-testing a tool for observing and curating the gaze typing-based communication of a consented eye-gaze AAC user with amyotrophic lateral sclerosis (ALS) from the perspective of researchers at the intersection of HCI and artificial intelligence (AI). With the intent to observe and accelerate eye-gaze typed communication, we designed a tool and a protocol called the SpeakFaster Observer to measure everyday conversational text entry by the consenting gaze-typing user, as well as several consenting conversation partners of the AAC user. We detail the design of the Observer software and data curation protocol, along with considerations for privacy protection. The deployment of the data protocol from November 2021 to April 2022 yielded a rich dataset of gaze-based AAC text entry in everyday context, consisting of 130+ hours of gaze keypresses and 5.5k+ curated speech utterances from the AAC user and the conversation partners. We present the key statistics of the data, including the speed (8.1±3.9 words per minute) and keypress saving rate (-0.18±0.87) of gaze typing, patterns of of utterance repetition and reuse, as well as the temporal dynamics of conversation turn-taking in gaze-based communication. We share our findings and also open source our data collections tools for furthering research in this domain. View details
    Preview abstract Recent advances in self-supervision have dramatically im- proved the quality of speech representations. However, wide deployment of state-of-the-art embedding models on devices has been severely restricted due to their limited public avail- ability and large resource footprint. Our work addresses these by publicly releasing a collection of paralinguistic speech models1 that are small, near state-of-the-art performance. Our approach is based on knowledge distillation, and our models are distilled only on public data. We explore differ- ent architectures and thoroughly evaluate our models on the Non-Semantic Speech (NOSS) benchmark. Our largest dis- tilled model is less than 16% the size of the original model (340MB vs 2.2GB) and achieves over 94% the accuracy on 6 of 7 tasks. The smallest model is less than 0.3% in size (22MB) and achieves over 90% as the accuracy on 6 of 7 tasks. View details
    Preview abstract Amyotrophic Lateral Sclerosis (ALS) disease progression is usually measured using the subjective, questionnaire-based revised ALS Functional Rating Scale (ALSFRS-R). A purely objective measure for tracking disease progression would be a powerful tool for evaluating real-world drug effectiveness, efficacy in clinical trials, as well as identifying participants for cohort studies. Here we develop a machine learning based objective measure for ALS disease progression, based on voice samples and accelerometer measurements. The ALS Therapy Development Institute (ALS-TDI) collected a unique dataset of voice and accelerometer samples from consented individuals - 584 people living with ALS over four years. Participants carried out prescribed speaking and limb-based tasks. 542 participants contributed 5814 voice recordings, and 350 contributed 13009 accelerometer samples, while simultaneously measuring ALSFRS-R. Using the data from 475 participants, we trained machine learning (ML) models, correlating voice with bulbar-related FRS scores and accelerometer with limb related scores. On the test set (n=109 participants) the voice models achieved an AUC of 0.86 (95% CI, 0.847-0.884) , whereas the accelerometer models achieved a median AUC of 0.73 . We used the models and self-reported ALSFRS-R scores to evaluate the real-world effects of edaravone, a drug recently approved for use in ALS, on 54 test participants. In the test cohort, the digital data input into the ML models produced objective measures of progression rates over the duration of the study that were consistent with self-reported scores. This demonstrates the value of these tools for assessing both disease progression and potentially drug effects. In this instance, outcomes from edaravone treatment, both self-reported and digital-ML, resulted in highly variable outcomes from person to person. View details
    Assessing ASR Model Quality on Disordered Speech using BERTScore
    Qisheng Li
    Katie Seaver
    Richard Jonathan Noel Cave
    Proc. 1st Workshop on Speech for Social Good (S4SG) (2022), pp. 26-30 (to appear)
    Preview abstract Word Error Rate (WER) is the primary metric used to assess automatic speech recognition (ASR) model quality. It has been shown that ASR models tend to have much higher WER on speakers with speech impairments than typical English speakers. It is hard to determine if models can be be useful at such high error rates. This study investigates the use of BERTScore, an evaluation metric for text generation, to provide a more informative measure of ASR model quality and usefulness. Both BERTScore and WER were compared to prediction errors manually annotated by Speech Language Pathologists for error type and assessment. BERTScore was found to be more correlated with human assessment of error type and assessment. BERTScore was specifically more robust to orthographic changes (contraction and normalization errors) where meaning was preserved. Furthermore, BERTScore was a better fit of error assessment than WER, as measured using an ordinal logistic regression and the Akaike's Information Criterion (AIC). Overall, our findings suggest that BERTScore can complement WER when assessing ASR model performance from a practical perspective, especially for accessibility applications where models are useful even at lower accuracy than for typical speech. View details
    Context-Aware Abbreviation Expansion Using Large Language Models
    Ajit Narayanan
    Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2022 (2022) (to appear)
    Preview abstract Motivated by the need for accelerating text entry in augmentative and alternative communication (AAC) for people with severe motor impairments, we propose a paradigm in which phrases are abbreviated aggressively as primarily word-initial letters. Our approach is to expand the abbreviations into full-phrase options by leveraging conversation context with the power of pretrained large language models (LLMs). Through zero-shot, few-shot, and fine-tuning experiments on four public conversation datasets, we show that for replies to the initial turn of a dialog, an LLM with 64B parameters is able to exactly expand over 70% of phrases with abbreviation length up to 10, leading to an effective keystroke saving rate of up to about 77% on these exact expansions. Including a small amount of context in the form of a single conversation turn more than doubles abbreviation expansion accuracies compared to having no context, an effect that is more pronounced for longer phrases. Additionally, the robustness of models against typo noise can be enhanced through fine-tuning on noisy data. View details
    Scaling Symbolic Methods using Gradients for Neural Model Explanation
    Subham Sekhar Sahoo
    Li Li
    Rishabh Singh
    Patrick Francis Riley
    International Conference on Learning Representations (ICLR) (2021)
    Preview abstract Symbolic techniques based on Satisfiability Modulo Theory (SMT) solvers have been proposed for analyzing and verifying neural network properties, but their usage has been fairly limited owing to their poor scalability with larger networks. In this work, we propose a technique for combining gradient-based methods with symbolic techniques to scale such analyses and demonstrate its application for model explanation. In particular, we apply this technique to identify minimal regions in an input that are most relevant for a neural network's prediction. Our approach uses gradient information (based on Integrated Gradients) to focus on a subset of neurons in the first layer, which allows our technique to scale to large networks. The corresponding SMT constraints encode the minimal input mask discovery problem such that after masking the input, the activations of the selected neurons are still above a threshold. After solving for the minimal masks, our approach scores the mask regions to generate a relative ordering of the features within the mask. This produces a saliency map which explains "where a model is looking" when making a prediction. We evaluate our technique on three datasets - MNIST, ImageNet, and Beer Reviews, and demonstrate both quantitatively and qualitatively that the regions generated by our approach are sparser and achieve higher saliency scores compared to the gradient-based methods alone. View details
    Preview abstract Automatic classification of disordered speech can provide an objective tool for identifying the presence and severity of a speech impairment. Classification approaches can also help identify hard-to-recognize speech samples to teach ASR systems about the variable manifestations of impaired speech. Here, we develop and compare different deep learning techniques to classify the intelligibility of disordered speech on selected phrases. We collected samples from a diverse set of 661 speakers with a variety of self-reported disorders speaking 29 words or phrases, which were rated by speech-language pathologists for their overall intelligibility using a five-point Likert scale. We then evaluated classifiers developed using 3 approaches: (1) a convolutional neural network (CNN) trained for the task, (2) classifiers trained on non-semantic speech representations from CNNs that used an unsupervised objective [1], and (3) classifiers trained on the acoustic (encoder) embeddings from an ASR system trained on typical speech [2]. We find that the ASR encoder’s embeddings considerably outperform the other two on detecting and classifying disordered speech. Further analysis shows that the ASR embeddings cluster speech by the spoken phrase, while the non-semantic embeddings cluster speech by speaker. Also, longer phrases are more indicative of intelligibility deficits than single words. View details
    Guided Integrated Gradients: An Adaptive Path Method for Removing Noise
    Besim Namik Avci
    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 5050-5058
    Preview abstract Integrated Gradients (IG) is a commonly used feature attribution method for deep neural networks. While IG has many desirable properties, when applied to visual models, the method often produces spurious/noisy pixel attributions in regions that are not related to the predicted class. While this has been previously noted, most existing solutions are aimed at addressing the symptoms by explicitly reducing the noise in the resulting attributions. In this work, we show that one of the causes of the problem is the presence of "adversarial examples'' along the IG path. To minimize the effect of adversarial examples on attributions, we propose adapting the attribution path itself. We introduce Adaptive Path Methods (APMs), as a generalization of path methods, and Guided IG as a specific instance of an APM. Empirically, Guided IG creates saliency maps better aligned with the model's prediction and the input image that is being explained. We show through qualitative and quantitative experiments that Guided IG outperforms IG on ImageNet, Open Images, and diabetic retinopathy medical images. View details
    Preview abstract We study the attribution problem (cf. ~\cite{SVZ13}) for deep networks applied to \emph{perception tasks}. Traditionally, the attribution problem is formulated as blaming the network's prediction on the pixels of the input image, i.e., the \emph{space} dimension. Often, signal is also present in the \emph{scale/frequency} dimension. We propose a new technique called \emph{Blur Integrated Gradients} that produces attributions in both space and in scale. Furthermore, we use the scale-space axioms (cf.~\cite{Lindeberg}) to argue that the input perturbations used by Blur Integrated Gradients will not accidentally create features. There resulting explanations are cleaner, and more faithful to how deep networks operate. We compare against some previously proposed techniques and demonstrate applications on three tasks: ImageNet object recognition, Diabetic Retinopathy prediction, and AudioSet audio event identification. View details
    Predicting the risk of developing diabetic retinopathy using deep learning
    Ashish Bora
    Siva Balasubramanian
    Sunny Virmani
    Akinori Mitani
    Guilherme De Oliveira Marinho
    Jorge Cuadros
    Dr. Paisan Raumviboonsuk
    Lily Hao Yi Peng
    Avinash Vaidyanathan Varadarajan
    Lancet Digital Health (2020)
    Preview abstract Background: Diabetic retinopathy screening is instrumental to preventing blindness, but scaling up screening is challenging because of the increasing number of patients with all forms of diabetes. We aimed to create a deep-learning system to predict the risk of patients with diabetes developing diabetic retinopathy within 2 years. Methods: We created and validated two versions of a deep-learning system to predict the development of diabetic retinopathy in patients with diabetes who had had teleretinal diabetic retinopathy screening in a primary care setting. The input for the two versions was either a set of three-field or one-field colour fundus photographs. Of the 575 431 eyes in the development set 28 899 had known outcomes, with the remaining 546 532 eyes used to augment the training process via multitask learning. Validation was done on one eye (selected at random) per patient from two datasets: an internal validation (from EyePACS, a teleretinal screening service in the USA) set of 3678 eyes with known outcomes and an external validation (from Thailand) set of 2345 eyes with known outcomes. Findings: The three-field deep-learning system had an area under the receiver operating characteristic curve (AUC) of 0·79 (95% CI 0·77–0·81) in the internal validation set. Assessment of the external validation set—which contained only one-field colour fundus photographs—with the one-field deep-learning system gave an AUC of 0·70 (0·67–0·74). In the internal validation set, the AUC of available risk factors was 0·72 (0·68–0·76), which improved to 0·81 (0·77–0·84) after combining the deep-learning system with these risk factors (p<0·0001). In the external validation set, the corresponding AUC improved from 0·62 (0·58–0·66) to 0·71 (0·68–0·75; p<0·0001) following the addition of the deep-learning system to available risk factors. Interpretation: The deep-learning systems predicted diabetic retinopathy development using colour fundus photographs, and the systems were independent of and more informative than available risk factors. Such a risk stratification tool might help to optimise screening intervals to reduce costs while improving vision-related outcomes. View details
    Scientific Discovery by Generating Counterfactuals using Image Translation
    Arununachalam Narayanaswamy
    Lily Hao Yi Peng
    Dr. Paisan Raumviboonsuk
    Avinash Vaidyanathan Varadarajan
    Proceedings of MICCAI, International Conference on Medical Image Computing and Computer-Assisted Intervention (2020)
    Preview abstract Visual recognition models are increasingly applied toscientific domains such as, drug studies and medical diag-noses, and model explanation techniques play a critical rolein understanding the source of a model’s performance andmaking its decisions transparent. In this work we investi-gate if explanation techniques can also be used as a mech-anism for scientific discovery. We make two contributions,first we propose a framework to convert predictions from ex-planation techniques to a mechanism of discovery. Secondwe show how generative models in combination with black-box predictors can be used to generate hypotheses (withouthuman priors) that can be critically examined. With thesetechniques we study classification models on retinal fundusimages predicting Diabetic Macular Edema (DME). Essen-tially deep convolutional models on 2D retinal fundus im-ages can do nearly as well as ophthalmologists looking at3D scans, making this an interesting case study of clinicalrelevance. Our work highlights that while existing expla-nation tools are useful, they do not necessarily provide acomplete answer. With the proposed framework we are ableto bridge the gap between model’s performance and humanunderstanding of the underlying mechanism which is of vi-tal scientific interest. View details
    GAN-Mediated Batch Equalization
    Arun Narayanaswamy
    Cassandra Xia
    Mike Ando
    Wesley Qian
    bioRxiv (2020)
    Preview abstract Advances in automation and imaging have made it possible to capture large imagedatasets for experiments that span multiple weeks (i.e. batches). However, almostall images experience batch-to-batch variation due to uncontrollable noise (e.g.different stain intensity or illumination conditions), and such complication makesit difficult to make biological comparison across all of the conditions spanningmultiple batches. To address the batch variation in these images, we developed abatch equalization method that can transfer image between batches (style) whilepreserving the semantic content of the image (i.e. the biological phenotype), andby equalizing all the images to the same batch, we can effectively mediate thebatch variation and highlight the biological variation. The equalization method istrained as a generative adversarial network (GAN) which has been quite successfulin doing style transfer for consumer images. By incorporating a new featuredisentanglement objective, our batch equalization GAN is able to reduce the batchvariation observed in the images and in the same time maintain the biologicalfeatures that correlated with the treatment conditions. View details
    Predicting OCT-derived DME grades from fundus photographs using deep learning
    Arunachalam Narayanaswamy
    Avinash Vaidyanathan Varadarajan
    Dr. Paisan Raumviboonsuk
    Dr. Peranut Chotcomwongse
    Jorge Cuadros
    Lily Hao Yi Peng
    Pearse Keane
    Nature Communications (2020)
    Preview abstract Diabetic eye disease is one of the fastest growing causes of preventable blindness. With the advent of anti-VEGF therapies, it has become increasingly important to detect center-involved DME (ci-DME). However, ci-DME is diagnosed using optical coherence tomography (OCT), which is not generally available at screening sites. Instead, screening programs rely on the detection of hard exudates as a proxy for DME on color fundus photographs, but this often results in a fair number of false positive and false negative calls. We trained a deep learning model to use color fundus images to directly predict grades derived from OCT exams for DME. Our OCT-based model had an AUC of 0.89 (95% CI: 0.87-0.91), which corresponds to a sensitivity of 85% at a specificity of 80%. In comparison, the ophthalmology graders had sensitivities ranging from 82%-85% and specificities ranging from 44%-50%. These metrics correspond to a PPV of 61% (95% CI: 56%-66%) for the OCT-based algorithm and a range of 36-38% (95% CI ranging from 33% -42%) for ophthalmologists. In addition, we used multiple attention techniques to explain how the model is making its prediction. The ability of deep learning algorithms to make clinically relevant predictions that generally requires sophisticated 3D-imaging equipment from simple 2D images has broad relevance to many other applications in medical imaging. View details
    It's easy to fool yourself: Case studies on identifying bias and confounding in bio-medical datasets
    Arunachalam Narayanaswamy
    Anton Geraschenko
    Scott Lipnick
    Nina Makhortova
    James Hawrot
    Christine Marques
    Joao Pereira
    Lee Rubin
    Brian Wainger,
    NeurIPS LMRL workshop 2019 (2019)
    Preview abstract Confounding variables are a well known source of nuisance in biomedical studies. They present an even greater challenge when we combine them with black-box machine learning techniques that operate on raw data. This work presents two case studies. In one, we discovered biases arising from systematic errors in the data generation process. In the other, we found a spurious source of signal unrelated to the prediction task at hand. In both cases, our prediction models performed well but under careful examination hidden confounders and biases were revealed. These are cautionary tales on the limits of using machine learning techniques on raw data from scientific experiments. View details
    Predicting Anemia from Fundus Images
    Akinori Mitani
    Abigail Huang
    Lily Peng
    Avinash Vaidyanathan Varadarajan
    Nature Biomedical Engineering (2019)
    Preview abstract Owing to the invasiveness of diagnostic tests for anaemia and the costs associated with screening for it, the condition is often undetected. Here, we show that anaemia can be detected via machine-learning algorithms trained using retinal fundus images, study participant metadata (including race or ethnicity, age, sex and blood pressure) or the combination of both data types (images and study participant metadata). In a validation dataset of 11,388 study participants from the UK Biobank, the fundusimage-only, metadata-only and combined models predicted haemoglobin concentration (in g dl–1) with mean absolute error values of 0.73 (95% confidence interval: 0.72–0.74), 0.67 (0.66–0.68) and 0.63 (0.62–0.64), respectively, and with areas under the receiver operating characteristic curve (AUC) values of 0.74 (0.71–0.76), 0.87 (0.85–0.89) and 0.88 (0.86–0.89), respectively. For 539 study participants with self-reported diabetes, the combined model predicted haemoglobin concentration with a mean absolute error of 0.73 (0.68–0.78) and anaemia an AUC of 0.89 (0.85–0.93). Automated anaemia screening on the basis of fundus images could particularly aid patients with diabetes undergoing regular retinal imaging and for whom anaemia can increase morbidity and mortality risks. View details
    Applying Deep Neural Network Analysis to High-Content Image-Based Assays
    Scott L. Lipnick
    Nina R. Makhortova
    Minjie Fan
    Zan Armstrong
    Thorsten M. Schlaeger
    Liyong Deng
    Wendy K. Chung
    Liadan O'Callaghan
    Anton Geraschenko
    Dosh Whye
    Jon Hazard
    Arunachalam Narayanaswamy
    D. Michael Ando
    Lee L. Rubin
    SLAS DISCOVERY: Advancing Life Sciences R\&D, vol. 0 (2019), pp. 2472555219857715
    Preview abstract The etiological underpinnings of many CNS disorders are not well understood. This is likely due to the fact that individual diseases aggregate numerous pathological subtypes, each associated with a complex landscape of genetic risk factors. To overcome these challenges, researchers are integrating novel data types from numerous patients, including imaging studies capturing broadly applicable features from patient-derived materials. These datasets, when combined with machine learning, potentially hold the power to elucidate the subtle patterns that stratify patients by shared pathology. In this study, we interrogated whether high-content imaging of primary skin fibroblasts, using the Cell Painting method, could reveal disease-relevant information among patients. First, we showed that technical features such as batch/plate type, plate, and location within a plate lead to detectable nuisance signals, as revealed by a pre-trained deep neural network and analysis with deep image embeddings. Using a plate design and image acquisition strategy that accounts for these variables, we performed a pilot study with 12 healthy controls and 12 subjects affected by the severe genetic neurological disorder spinal muscular atrophy (SMA), and evaluated whether a convolutional neural network (CNN) generated using a subset of the cells could distinguish disease states on cells from the remaining unseen control–SMA pair. Our results indicate that these two populations could effectively be differentiated from one another and that model selectivity is insensitive to batch/plate type. One caveat is that the samples were also largely separated by source. These findings lay a foundation for how to conduct future studies exploring diseases with more complex genetic contributions and unknown subtypes. View details
    No Results Found