Jake Garrison
I was born in Spokane, Washington, attended University of Washington for undergraduate and graduate school. I built my own electric car around when I was 16, and this project served as a catalyst for my educational interests in college.
During my undergraduate years, I studied electrical engineering with a focus on power electronics and battery management for electric cars as well as analog audio circuits and digital signal processing. Additionally, I was an early researcher in autonomous driving using multimodal deep neural networks. I also developed various AI driven apps and software in startup type environments.
For graduate school, I joined the UW Ubicomp lab where I researched novel health sensing on mobile devices. My thesis was on sound based lung function testing. I continue to do this type of research at Google Health.
Authored Publications
Google Publications
Other Publications
Sort By
Optimizing Audio Augmentations for Contrastive Learning of Health-Related Acoustic Signals
Louis Blankemeier
Sebastien Baur
Diego Ardila
arXiv (2023)
Preview abstract
Learned speech representations can drastically improve performance on tasks with limited labeled data. However, due to their size and complexity, learned representations have limited utility
in mobile settings where run-time performance can be a significant bottleneck. In this work, we propose a class of lightweight speech embedding models that run efficiently on mobile devices
based on the recently proposed TRILL speech embedding. We combine novel architectural modifications with existing speedup techniques to create embedding models that are fast enough to run in real-time on a mobile device and exhibit minimal performance degradation on a benchmark of non-semantic speech tasks. One such model (FRILL) is 32x faster on a Pixel 1 smartphone and 40% the size of TRILL, with an average decrease in accuracy of only 2%. To our knowledge, FRILL is the highest quality non-semantic embedding designed for use on mobile devices. Furthermore, we demonstrate that these representations are useful for mobile health tasks such as non-speech human sounds detection and face-masked speech detection. Our training and evaluation code is publicly available.
View details
Whosecough: In-the-Wild Cougher Verification Using Multitask Learning
Matt Whitehill
Shwetak Patel
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 896-900
Preview abstract
Current automatic cough counting systems can determine how many coughs are present in an audio recording. However, they cannot determine who produced the cough. This limits their usefulness as most systems are deployed in locations with multiple people (i.e., a smart home device in a four-person home). Previous models trained solely on speech performed reasonably well on forced coughs [1]. By incorporating coughs into the training data, the model performance should improve. However, since limited natural cough data exists, training on coughs can lead to model overfitting. In this work, we overcome this problem by using multitask learning, where the second task is speaker verification. Our model achieves 82.15% classification accuracy amongst four users on a natural, in-the-wild cough dataset, outperforming human evaluators on average by 9.82%.
View details
Preview abstract
Prior work has shown that smartphone spirometry can effectively measure lung function using the phone’s built-in microphone and could one day play a critical role in making spirometry more usable, accessible, and cost-effective. Although traditional spirometry is performed with the guidance of a medical expert, smartphone spirometry lacks the ability to provide the patient feedback or guarantee the quality of a patient’s spirometry efforts. Smartphone spirometry is particu- larly susceptible to poorly performed efforts because any sounds in the environment (e.g., a person’s voice) or mistakes in the effort (e.g., coughs or short breaths) can invalidate the results. We introduce two approaches to analyze and estimate the quality of smartphone spirometry efforts. A gradient boosting model achieves 98.2% precision and 86.6% recall identifying invalid efforts when given expert tuned audio features, while a Gated-Convolutional Recurrent Neural Network achieves 98.3% precision and 88.0% recall and automatically develops patterns from a Mel-spectrogram, a more general audio feature.
View details
No Results Found