Jake Garrison
I was born in Spokane, Washington, attended University of Washington for undergraduate and graduate school. I built my own electric car around when I was 16, and this project served as a catalyst for my educational interests in college.
During my undergraduate years, I studied electrical engineering with a focus on power electronics and battery management for electric cars as well as analog audio circuits and digital signal processing. Additionally, I was an early researcher in autonomous driving using multimodal deep neural networks. I also developed various AI driven apps and software in startup type environments.
For graduate school, I joined the UW Ubicomp lab where I researched novel health sensing on mobile devices. My masters thesis was on sound based lung function testing. I continue to do this type of research at Google Health.
Authored Publications
Sort By
Optimizing Audio Augmentations for Contrastive Learning of Health-Related Acoustic Signals
Louis Blankemeier
Sebastien Baur
Diego Ardila
arXiv (2023)
Towards Accurate Differential Diagnosis with Large Language Models
Daniel McDuff
Anil Palepu
Amy Wang
Karan Singhal
Yash Sharma
Kavita Kulkarni
Le Hou
Sara Mahdavi
Sushant Prakash
Anupam Pathak
Shwetak Patel
Ewa Dominowska
Juro Gottweis
Joelle Barral
Kat Chou
Jake Sunshine
Arxiv (2023)
Preview abstract
An accurate differential diagnosis (DDx) is a cornerstone of medical care, often reached through an iterative process of interpretation that combines clinical history, physical examination, investigations and procedures. Interactive interfaces powered by Large Language Models (LLMs) present new opportunities to both assist and automate aspects of this process. In this study, we introduce an LLM optimized for diagnostic reasoning, and evaluate its ability to generate a DDx alone or as an aid to clinicians. 20 clinicians evaluated 302 challenging, real-world medical cases sourced from the New England Journal of Medicine (NEJM) case reports. Each case report was read by two clinicians, who were randomized to one of two assistive conditions: either assistance from search engines and standard medical resources, or LLM assistance in addition to these tools. All clinicians provided a baseline, unassisted DDx prior to using the respective assistive tools. Our LLM for DDx exhibited standalone performance that exceeded that of unassisted clinicians (top-10 accuracy 59.1% vs 33.6%, [p = 0.04]). Comparing the two assisted study arms, the DDx quality score was higher for clinicians assisted by our LLM (top-10 accuracy 51.7%) compared to clinicians without its assistance (36.1%) (McNemar's Test: 45.7, p < 0.01) and clinicians with search (44.4%) (4.75, p = 0.03). Further, clinicians assisted by our LLM arrived at more comprehensive differential lists than those without its assistance. Our study suggests that our LLM for DDx has potential to improve clinicians' diagnostic reasoning and accuracy in challenging cases, meriting further real-world evaluation for its ability to empower physicians and widen patients' access to specialist-level expertise.
View details
Preview abstract
Learned speech representations can drastically improve performance on tasks with limited labeled data. However, due to their size and complexity, learned representations have limited utility
in mobile settings where run-time performance can be a significant bottleneck. In this work, we propose a class of lightweight speech embedding models that run efficiently on mobile devices
based on the recently proposed TRILL speech embedding. We combine novel architectural modifications with existing speedup techniques to create embedding models that are fast enough to run in real-time on a mobile device and exhibit minimal performance degradation on a benchmark of non-semantic speech tasks. One such model (FRILL) is 32x faster on a Pixel 1 smartphone and 40% the size of TRILL, with an average decrease in accuracy of only 2%. To our knowledge, FRILL is the highest quality non-semantic embedding designed for use on mobile devices. Furthermore, we demonstrate that these representations are useful for mobile health tasks such as non-speech human sounds detection and face-masked speech detection. Our training and evaluation code is publicly available.
View details
Whosecough: In-the-Wild Cougher Verification Using Multitask Learning
Matt Whitehill
Shwetak Patel
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 896-900
Preview abstract
Current automatic cough counting systems can determine how many coughs are present in an audio recording. However, they cannot determine who produced the cough. This limits their usefulness as most systems are deployed in locations with multiple people (i.e., a smart home device in a four-person home). Previous models trained solely on speech performed reasonably well on forced coughs [1]. By incorporating coughs into the training data, the model performance should improve. However, since limited natural cough data exists, training on coughs can lead to model overfitting. In this work, we overcome this problem by using multitask learning, where the second task is speaker verification. Our model achieves 82.15% classification accuracy amongst four users on a natural, in-the-wild cough dataset, outperforming human evaluators on average by 9.82%.
View details
Preview abstract
Prior work has shown that smartphone spirometry can effectively measure lung function using the phone’s built-in microphone and could one day play a critical role in making spirometry more usable, accessible, and cost-effective. Although traditional spirometry is performed with the guidance of a medical expert, smartphone spirometry lacks the ability to provide the patient feedback or guarantee the quality of a patient’s spirometry efforts. Smartphone spirometry is particu- larly susceptible to poorly performed efforts because any sounds in the environment (e.g., a person’s voice) or mistakes in the effort (e.g., coughs or short breaths) can invalidate the results. We introduce two approaches to analyze and estimate the quality of smartphone spirometry efforts. A gradient boosting model achieves 98.2% precision and 86.6% recall identifying invalid efforts when given expert tuned audio features, while a Gated-Convolutional Recurrent Neural Network achieves 98.3% precision and 88.0% recall and automatically develops patterns from a Mel-spectrogram, a more general audio feature.
View details