Marc A. Coram
Marc got a BS in Math and CS (E&AS) from Caltech, then a PhD in Statistics from Stanford. He was an Assistant Professor of Statistics at the University of Chicago (2002-2006) and an Assistant Professor of Biostatistics at Stanford (2006-2014) before joining Google as a software engineer on the Google Accelerated Science team.
Research Areas
Authored Publications
Sort By
ProtSeq: towards high-throughput, single-molecule protein sequencing via amino acid conversion into DNA barcodes
Jessica Hong
Michael Connor Gibbons
Ali Bashir
Diana Wu
Shirley Shao
Zachary Cutts
Mariya Chavarha
Ye Chen
Lauren Schiff
Mikelle Foster
Victoria Church
Llyke Ching
Sara Ahadi
Anna Hieu-Thao Le
Alexander Tran
Michelle Therese Dimon
Phillip Jess
iScience, 25 (2022), pp. 32
Preview abstract
We demonstrate early progress toward constructing a high-throughput, single-molecule protein sequencing technology utilizing barcoded DNA aptamers (binders) to recognize terminal amino acids of peptides (targets) tethered on a next-generation sequencing chip. DNA binders deposit unique, amino acid identifying barcodes on the chip. The end goal is that over multiple binding cycles, a sequential chain of DNA barcodes will identify the amino acid sequence of a peptide. Toward this, we demonstrate successful target identification with two sets of target-binder pairs: DNA-DNA and Peptide-Protein. For DNA-DNA binding, we show assembly and sequencing of DNA barcodes over 6 consecutive binding cycles. Intriguingly, our computational simulation predicts that a small set of semi-selective DNA binders offers significant coverage of the human proteome. Toward this end, we introduce a binder discovery pipeline that ultimately could merge with the chip assay into a technology called ProtSeq, for future high-throughput, single-molecule protein sequencing.
View details
Discovery of complex oxides via automated experiments and data science
Joel A Haber
Zan Armstrong
Kevin Kan
Lan Zhou
Matthias H Richter
Christopher Roat
Nicholas Wagner
Patrick Francis Riley
John M Gregoire
Proceedings of the Natural Academy of Sciences (2021)
Preview abstract
The quest to identify materials with tailored properties is increasingly expanding into high-order composition spaces, where materials discovery efforts have been met with the dual challenges of a combinatorial explosion in the number of candidate materials and a lack of predictive computation to guide experiments. The traditional approach to predictive materials science involves establishing a model that maps composition and structure to properties. We explore an inverse approach wherein a data science workflow uses high throughput measurements of optical properties to identify the composition spaces with interesting materials science. By identifying composition regions whose optical trends cannot be explained by trivial phase behavior, the data science pipeline identifies candidate combinations of elements that form 3-cation metal oxide phases. The identification of such novel phase behavior elevates the measurement of optical properties to the discovery of materials with complex phase-dependent properties. This conceptual workflow is illustrated with Co-Ta-Sn oxides wherein a new rutile alloy is discovered via data science guidance from the high throughput optical characterization. The composition-tuned properties of the rutile oxide alloys include transparency, catalytic activity, and stability in strong acid electrolytes. In addition to the unprecedented mapping of optical properties in 108 unique 3-cation oxide composition spaces, we present a critical discussion of coupling data validation to experiment design to generate a reliable end-to-end high throughput workflow for accelerating scientific discovery.
View details
Preview abstract
We present IDEA (the Induction Dynamics gene Expression Atlas), a dataset constructed by independently inducing hundreds of transcription factors (TFs) and measuring timecourses of the resulting gene expression responses in budding yeast. Each experiment captures a regulatory cascade connecting a single induced regulator to the genes it causally regulates. We discuss the regulatory cascade of a single TF, Aft1, in detail; however, IDEA contains > 200 TF induction experiments with 20 million individual observations and 100,000 signal‐containing dynamic responses. As an application of IDEA, we integrate all timecourses into a whole‐cell transcriptional model, which is used to predict and validate multiple new and underappreciated transcriptional regulators. We also find that the magnitudes of coefficients in this model are predictive of genetic interaction profile similarities. In addition to being a resource for exploring regulatory connectivity between TFs and their target genes, our modeling approach shows that combining rapid perturbations of individual genes with genome‐scale time‐series measurements is an effective strategy for elucidating gene regulatory networks.
View details
Quantum Optimization with a Novel Gibbs Objective Function and Ansatz Architecture Search
Li Li
Minjie Fan
Patrick Riley
Stefan Leichenauer
Phys. Rev. Research, 2 (2020), pp. 023074
Preview abstract
The quantum approximate optimization algorithm (QAOA) is a standard method for combinatorial optimization with a gate-based quantum computer. The QAOA consists of a particular ansatz for the quantum circuit architecture, together with a prescription for choosing the variational parameters of the circuit. We propose modifications to both. First, we define the Gibbs objective function and show that it is superior to the energy expectation value for use as an objective function in tuning the variational parameters. Second, we describe an ansatz architecture search (AAS) algorithm for searching the discrete space of quantum circuit architectures near the QAOA to find a better ansatz. Applying these modifications for a complete graph Ising model results in a 244.7% median relative improvement in the probability of finding a low-energy state while using 33.3% fewer two-qubit gates. For Ising models on a 2d grid we similarly find 44.4% median improvement in the probability with a 20.8% reduction in the number of two-qubit gates. This opens a new research field of quantum circuit architecture design for quantum optimization algorithms.
View details
Performance of a Deep-Learning Algorithm vs Manual Grading for Detecting Diabetic Retinopathy in India
Renu P. Rajan
Derek Wu
Peter Wubbels
Tyler Rhodes
Kira Whitehouse
Ramasamy Kim
Rajiv Raman
Lily Peng
JAMA Ophthalmology (2019)
Preview abstract
Importance More than 60 million people in India have diabetes and are at risk for diabetic retinopathy (DR), a vision-threatening disease. Automated interpretation of retinal fundus photographs can help support and scale a robust screening program to detect DR.
Objective To prospectively validate the performance of an automated DR system across 2 sites in India.
Design, Setting, and Participants This prospective observational study was conducted at 2 eye care centers in India (Aravind Eye Hospital and Sankara Nethralaya) and included 3049 patients with diabetes. Data collection and patient enrollment took place between April 2016 and July 2016 at Aravind and May 2016 and April 2017 at Sankara Nethralaya. The model was trained and fixed in March 2016.
Interventions Automated DR grading system compared with manual grading by 1 trained grader and 1 retina specialist from each site. Adjudication by a panel of 3 retinal specialists served as the reference standard in the cases of disagreement.
Main Outcomes and Measures Sensitivity and specificity for moderate or worse DR or referable diabetic macula edema.
Results Of 3049 patients, 1091 (35.8%) were women and the mean (SD) age for patients at Aravind and Sankara Nethralaya was 56.6 (9.0) years and 56.0 (10.0) years, respectively. For moderate or worse DR, the sensitivity and specificity for manual grading by individual nonadjudicator graders ranged from 73.4% to 89.8% and from 83.5% to 98.7%, respectively. The automated DR system’s performance was equal to or exceeded manual grading, with an 88.9% sensitivity (95% CI, 85.8-91.5), 92.2% specificity (95% CI, 90.3-93.8), and an area under the curve of 0.963 on the data set from Aravind Eye Hospital and 92.1% sensitivity (95% CI, 90.1-93.8), 95.2% specificity (95% CI, 94.2-96.1), and an area under the curve of 0.980 on the data set from Sankara Nethralaya.
Conclusions and Relevance This study shows that the automated DR system generalizes to this population of Indian patients in a prospective setting and demonstrates the feasibility of using an automated DR grading system to expand screening programs.
View details
Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs
Lily Peng
Martin C Stumpe
Derek Wu
Arunachalam Narayanaswamy
Subhashini Venugopalan
Tom Madams
Jorge Cuadros
Ramasamy Kim
Rajiv Raman
Jessica Mega
JAMA (2016)
Preview abstract
Importance: Deep learning is a family of computational methods that allow an algorithm to program itself by learning from a large set of examples that demonstrate the desired behavior, removing the need to specify rules explicitly. Application of these methods to medical imaging requires further assessment and validation.
Objective: To apply deep learning to create an algorithm for automated detection of diabetic retinopathy and diabetic macular edema in retinal fundus photographs.
Design and Setting: A specific type of neural network optimized for image classification called a deep convolutional neural network was trained using a retrospective development data set of 128 175 retinal images, which were graded 3 to 7 times for diabetic retinopathy, diabetic macular edema, and image gradability by a panel of 54 US licensed ophthalmologists and ophthalmology senior residents between May and December 2015. The resultant algorithm was validated in January and February 2016 using 2 separate data sets, both graded by at least 7 US board-certified ophthalmologists with high intragrader consistency.
Exposure: Deep learning–trained algorithm.
Main Outcomes and Measures: The sensitivity and specificity of the algorithm for detecting referable diabetic retinopathy (RDR), defined as moderate and worse diabetic retinopathy, referable diabetic macular edema, or both, were generated based on the reference standard of the majority decision of the ophthalmologist panel. The algorithm was evaluated at 2 operating points selected from the development set, one selected for high specificity and another for high sensitivity.
Results: The EyePACS-1 data set consisted of 9963 images from 4997 patients (mean age, 54.4 years; 62.2% women; prevalence of RDR, 683/8878 fully gradable images [7.8%]); the Messidor-2 data set had 1748 images from 874 patients (mean age, 57.6 years; 42.6% women; prevalence of RDR, 254/1745 fully gradable images [14.6%]). For detecting RDR, the algorithm had an area under the receiver operating curve of 0.991 (95% CI, 0.988-0.993) for EyePACS-1 and 0.990 (95% CI, 0.986-0.995) for Messidor-2. Using the first operating cut point with high specificity, for EyePACS-1, the sensitivity was 90.3% (95% CI, 87.5%-92.7%) and the specificity was 98.1% (95% CI, 97.8%-98.5%). For Messidor-2, the sensitivity was 87.0% (95% CI, 81.1%-91.0%) and the specificity was 98.5% (95% CI, 97.7%-99.1%). Using a second operating point with high sensitivity in the development set, for EyePACS-1 the sensitivity was 97.5% and specificity was 93.4% and for Messidor-2 the sensitivity was 96.1% and specificity was 93.9%.
Conclusions and Relevance: In this evaluation of retinal fundus photographs from adults with diabetes, an algorithm based on deep machine learning had high sensitivity and specificity for detecting referable diabetic retinopathy. Further research is necessary to determine the feasibility of applying this algorithm in the clinical setting and to determine whether use of the algorithm could lead to improved care and outcomes compared with current ophthalmologic assessment.
View details