Philip Nelson
Philip Nelson is a Director of Engineering in Google Research. He joined Google in 2008 and was previously responsible for a range of Google applications and geo services. He currently leads the Google Accelerated Science team that collaborates with external scientists to apply Google's knowledge and experience in running complex algorithms over large data sets to important scientific problems. Philip graduated from MIT in 1985 where he did award-winning research on hip prosthetics at Harvard Medical School. He helped found and lead several Silicon Valley start ups in search (Verity), optimization (Impresse), and genome sequencing (Complete Genomics) and was also an Entrepreneur in Residence at Accel Partners.
Research Areas
Authored Publications
Sort By
Automatic Speech Recognition of Conversational Speech in Individuals with Disordered Speech
Bob MacDonald
Rus Heywood
Richard Cave
Katie Seaver
Antoine Desjardins
Jordan Green
Journal of Speech, Language, and Hearing Research (2024) (to appear)
Preview abstract
Purpose: This study examines the effectiveness of automatic speech recognition (ASR) for individuals with speech disorders, addressing the gap in performance between read and conversational ASR. We analyze the factors influencing this disparity and the effect of speech mode-specific training on ASR accuracy.
Method: Recordings of read and conversational speech from 27 individuals with various speech disorders were analyzed using both (1) one speaker-independent ASR system trained and optimized for typical speech and (2) multiple ASR models that were personalized to the speech of the participants with disordered speech. Word Error Rates (WERs) were calculated for each speech mode, read vs conversational, and subject. Linear mixed-effect models were used to assess the impact of speech mode and disorder severity on ASR accuracy. We investigated nine variables, classified as technical, linguistic, or speech impairment factors, for their potential influence on the performance gap.
Results: We found a significant performance gap between read and conversational speech in both personalized and unadapted ASR models. Speech impairment severity notably impacted recognition accuracy in unadapted models for both speech modes and in personalized models for read speech. Linguistic attributes of utterances were the most influential on accuracy, though atypical speech characteristics also played a role. Including conversational speech samples in model training notably improved recognition accuracy.
Conclusions: We observed a significant performance gap in ASR accuracy between read and conversational speech for individuals with speech disorders. This gap was largely due to the linguistic complexity and unique characteristics of speech disorders in conversational speech. Training personalized ASR models using conversational speech significantly improved recognition accuracy, demonstrating the importance of domain-specific training and highlighting the need for further research into ASR systems capable of handling disordered conversational speech effectively.
View details
Automatic Speech Recognition of Disordered Speech: Personalized models outperforming human listeners on short phrases
Jordan R. Green
Bob MacDonald
Rus Heywood
Richard Cave
Katie Seaver
Marilyn Ladewig
Interspeech (2021) (to appear)
Preview abstract
Objective. This study aimed to (1) evaluate the performance of personalized Automatic Speech Recognition (ASR) models on disordered speech samples representing a wide range of etiologies and speech severities, and (2) compare the accuracy of these models to that of speaker-independent ASR models developed on and for typical speech as well as expert human listeners. Methods. 432 individuals with self-reported disordered speech recorded at least 300 short phrases using a web-based application. Word error rates (WER) were computed using three different ASR models and expert human transcribers. Metadata were collected to evaluate the potential impact of participant, atypical speech, and technical factors on recognition accuracy. Results. The accuracy of personalized models for recognizing disordered speech was high (WER: 4.6%), and significantly better than speaker-independent models (WER: 31%). Personalized models also outperformed human transcribers (WER gain: 9%) with relative gains in accuracy as high as 80%. The most significant gain in recognition performance was for the most severely affected speakers. Low SNR and fewer training utterances adversely affected recognition even for speakers with mild speech impairments. Conclusions. Personalized ASR models have significant potential for improving communication for persons with impaired speech.
View details
A Voice-Activated Switch for Persons with Motor and Speech Impairments: Isolated-Vowel Spotting Using Neural Networks
Lisie Lillianfeld
Katie Seaver
Jordan R. Green
InterSpeech 2021 (2021)
Preview abstract
Severe speech impairments limit the precision and range of producible speech sounds. As a result, generic automatic speech recognition (ASR) and keyword spotting (KWS) systems are unable to accurately recognize the utterances produced by individuals with severe speech impairments. This paper describes an approach in which simple speech sounds, namely isolated open vowels (e.g., /a/), are used in lieu of more motorically-demanding keywords. A neural network (NN) is trained to detect these isolated open vowels uttered by individuals with speech impairments against background noise. The NN is trained with a two-phase approach. The pre-training phase uses samples from unimpaired speakers along with samples of background noises and unrelated speech; then the fine-tuning stage uses samples of vowel samples collected from individuals with speech impairments. This model can be built into an experimental mobile app that allows users to activate preconfigured actions such as alerting caregivers. Preliminary user testing indicates the model has the potential to be a useful and flexible emergency communication channel for motor- and speech-impaired individuals.
View details
Disordered Speech Data Collection: Lessons Learned at 1 Million Utterances from Project Euphonia
Bob MacDonald
Rus Heywood
Richard Cave
Katie Seaver
Marilyn Ladewig
Jordan R. Green
Interspeech (2021) (to appear)
Preview abstract
Speech samples from over 1000 individuals with impaired speech have been submitted for Project Euphonia, aimed at improving automated speech recognition for atypical speech. We provide an update on the contents of the corpus, which recently passed 1 million utterances, and review key lessons learned from this project.
The reasoning behind decisions such as phrase set composition, prompted vs extemporaneous speech, metadata and data quality efforts are explained based on findings from both technical and user-facing research.
View details
Scientific Discovery by Generating Counterfactuals using Image Translation
Arununachalam Narayanaswamy
Lily Hao Yi Peng
Dr. Paisan Raumviboonsuk
Avinash Vaidyanathan Varadarajan
Proceedings of MICCAI, International Conference on Medical Image Computing and Computer-Assisted Intervention (2020)
Preview abstract
Visual recognition models are increasingly applied toscientific domains such as, drug studies and medical diag-noses, and model explanation techniques play a critical rolein understanding the source of a model’s performance andmaking its decisions transparent. In this work we investi-gate if explanation techniques can also be used as a mech-anism for scientific discovery. We make two contributions,first we propose a framework to convert predictions from ex-planation techniques to a mechanism of discovery. Secondwe show how generative models in combination with black-box predictors can be used to generate hypotheses (withouthuman priors) that can be critically examined. With thesetechniques we study classification models on retinal fundusimages predicting Diabetic Macular Edema (DME). Essen-tially deep convolutional models on 2D retinal fundus im-ages can do nearly as well as ophthalmologists looking at3D scans, making this an interesting case study of clinicalrelevance. Our work highlights that while existing expla-nation tools are useful, they do not necessarily provide acomplete answer. With the proposed framework we are ableto bridge the gap between model’s performance and humanunderstanding of the underlying mechanism which is of vi-tal scientific interest.
View details
Applying Deep Neural Network Analysis to High-Content Image-Based Assays
Scott L. Lipnick
Nina R. Makhortova
Minjie Fan
Zan Armstrong
Thorsten M. Schlaeger
Liyong Deng
Wendy K. Chung
Liadan O'Callaghan
Anton Geraschenko
Dosh Whye
Jon Hazard
Arunachalam Narayanaswamy
D. Michael Ando
Lee L. Rubin
SLAS DISCOVERY: Advancing Life Sciences R\&D, 0 (2019), pp. 2472555219857715
Preview abstract
The etiological underpinnings of many CNS disorders are not well understood. This is likely due to the fact that individual diseases aggregate numerous pathological subtypes, each associated with a complex landscape of genetic risk factors. To overcome these challenges, researchers are integrating novel data types from numerous patients, including imaging studies capturing broadly applicable features from patient-derived materials. These datasets, when combined with machine learning, potentially hold the power to elucidate the subtle patterns that stratify patients by shared pathology. In this study, we interrogated whether high-content imaging of primary skin fibroblasts, using the Cell Painting method, could reveal disease-relevant information among patients. First, we showed that technical features such as batch/plate type, plate, and location within a plate lead to detectable nuisance signals, as revealed by a pre-trained deep neural network and analysis with deep image embeddings. Using a plate design and image acquisition strategy that accounts for these variables, we performed a pilot study with 12 healthy controls and 12 subjects affected by the severe genetic neurological disorder spinal muscular atrophy (SMA), and evaluated whether a convolutional neural network (CNN) generated using a subset of the cells could distinguish disease states on cells from the remaining unseen control–SMA pair. Our results indicate that these two populations could effectively be differentiated from one another and that model selectivity is insensitive to batch/plate type. One caveat is that the samples were also largely separated by source. These findings lay a foundation for how to conduct future studies exploring diseases with more complex genetic contributions and unknown subtypes.
View details
Similar Image Search for Histopathology: SMILY
Jason Hipp
Michael Emmert-Buck
Daniel Smilkov
Mahul Amin
Craig Mermel
Lily Peng
Martin Stumpe
Nature Partner Journal (npj) Digital Medicine (2019)
Preview abstract
The increasing availability of large institutional and public histopathology image datasets is enabling the searching of these datasets for diagnosis, research, and education. Although these datasets typically have associated metadata such as diagnosis or clinical notes, even carefully curated datasets rarely contain annotations of the location of regions of interest on each image. As pathology images are extremely large (up to 100,000 pixels in each dimension), further laborious visual search of each image may be needed to find the feature of interest. In this paper, we introduce a deep-learning-based reverse image search tool for histopathology images: Similar Medical Images Like Yours (SMILY). We assessed SMILY’s ability to retrieve search results in two ways: using pathologist-provided annotations, and via prospective studies where pathologists evaluated the quality of SMILY search results. As a negative control in the second evaluation, pathologists were blinded to whether search results were retrieved by SMILY or randomly. In both types of assessments, SMILY was able to retrieve search results with similar histologic features, organ site, and prostate cancer Gleason grade compared with the original query. SMILY may be a useful general purpose tool in the pathologist’s arsenal, to improve the efficiency of searching large archives of histopathology images, without the need to develop and implement specific tools for each application.
View details
Assessing microscope image focus quality with deep learning
D. Michael Ando
Mariya Barch
Arunachalam Narayanaswamy
Eric Christiansen
Chris Roat
Jane Hung
Curtis T. Rueden
Asim Shankar
Steven Finkbeiner
BMC Bioinformatics, 19 (2018), pp. 77
Preview abstract
Background: Large image datasets acquired on automated microscopes typically have some fraction of low quality, out-of-focus images, despite the use of hardware autofocus systems. Identification of these images using automated image analysis with high accuracy is important for obtaining a clean, unbiased image dataset. Complicating this task is the fact that image focus quality is only well-defined in foreground regions of images, and as a result, most previous approaches only enable a computation of the relative difference in quality between two or more images, rather than an absolute measure of quality.
Results: We present a deep neural network model capable of predicting an absolute measure of image focus on a single image in isolation, without any user-specified parameters. The model operates at the image-patch level, and also outputs a measure of prediction certainty, enabling interpretable predictions. The model was trained on only 384 in-focus Hoechst (nuclei) stain images of U2OS cells, which were synthetically defocused to one of 11 absolute defocus levels during training. The trained model can generalize on previously unseen real Hoechst stain images, identifying the absolute image focus to within one defocus level (approximately 3 pixel blur diameter difference) with 95% accuracy. On a simpler binary in/out-of-focus classification task, the trained model outperforms previous approaches on both Hoechst and Phalloidin (actin) stain images (F-scores of 0.89 and 0.86, respectively over 0.84 and 0.83), despite only having been presented Hoechst stain images during training. Lastly, we observe qualitatively that the model generalizes to two additional stains, Hoechst and Tubulin, of an unseen cell type (Human MCF-7) acquired on a different instrument.
Conclusions: Our deep neural network enables classification of out-of-focus microscope images with both higher accuracy and greater precision than previous approaches via interpretable patch-level focus and certainty predictions. The use of synthetically defocused images precludes the need for a manually annotated training dataset. The model also generalizes to different image and cell types. The framework for model training and image prediction is available as a free software library and the pre-trained model is available for immediate use in Fiji (ImageJ) and CellProfiler.
View details
In Silico Labeling: Predicting Fluorescent Labels in Unlabeled Images
Eric Christiansen
Mike Ando
Ashkan Javaherian
Gaia Skibinski
Scott Lipnick
Elliot Mount
Alison O'Neil
Kevan Shah
Alicia K. Lee
Piyush Goyal
Liam Fedus
Andre Esteva
Lee Rubin
Steven Finkbeiner
Cell (2018)
Preview abstract
Imaging is a central method in life sciences, and the drive to extract information from microscopy approaches has led to methods to fluorescently label specific cellular constituents. However, the specificity of fluorescent labels varies, labeling can confound biological measurements, and spectral overlap limits the number of labels to a few that can be resolved simultaneously. Here, we developed a deep learning computational approach called “in silico labeling (ISL)” that reliably infers information from unlabeled biological samples that would normally require invasive labeling. ISL predicts different labels in multiple cell types from independent laboratories. It makes cell type predictions by integrating in silico labels, and is not limited by spectral overlap. The network learned generalized features, enabling it to solve new problems with small training datasets. Thus, ISL provides biological insights from images of unlabeled samples for negligible additional cost that would be undesirable or impossible to measure directly.
View details
Detecting Cancer Metastases on Gigapixel Pathology Images
Krishna Kumar Gadepalli
Mohammad Norouzi
Timo Kohlberger
Subhashini Venugopalan
Aleksei Timofeev
Jason Hipp
Lily Peng
Martin Stumpe
arXiv (2017)
Preview abstract
Each year, the treatment decisions for more than 230,000 breast cancer patients in the U.S. hinge on whether the cancer has metastasized away from the breast. Metastasis detection is currently performed by pathologists reviewing large expanses of biological tissues. This process is labor intensive and error-prone. We present a framework to automatically detect and localize tumors as small as 100 x 100 pixels in gigapixel microscopy images sized 100,000 x 100,000 pixels. Our method leverages a convolutional neural network (CNN) architecture and obtains state-of-the-art results on the Camelyon16 dataset in the challenging lesion-level tumor detection task. At 8 false positives per image, we detect 92.4% of the tumors, relative to 82.7% by the previous best automated approach. For comparison, a human pathologist attempting exhaustive search achieved 73.2% sensitivity. We achieve image-level AUC scores above 97% on both the Camelyon16 test set and an independent set of 110 slides. In addition, we discover that two slides in the Camelyon16 training set were erroneously labeled normal. Our approach could considerably reduce false negative rates in metastasis detection.
View details