Jump to Content
Marc Berndl

Marc Berndl

Marc Berndl has been at Google since 2005, has a Master’s Degree in Computer Science from McGill University, and is Engineering Lead for Google Accelerated Science. Marc spent eight years in Ads working on auction theory, data analysis as well as experimental design. Within GAS, Marc has established ongoing research efforts in material science, biochemistry, cell biology, and drug screening. His current research includes predictive model semantics, solar thermal energy optimization, aptamer design, and methods of detection, localization and quantification of cellular proteins.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Longitudinal fundus imaging and its genome-wide association analysis provides evidence for a human retinal aging clock
    Sara Ahadi
    Kenneth A Wilson Jr,
    Orion Pritchard
    Ajay Kumar
    Enrique M Carrera
    Ricardo Lamy
    Jay M Stewart
    Avinash Varadarajan
    Pankaj Kapahi
    Ali Bashir
    eLife (2023)
    Preview abstract Background Biological age, distinct from an individual’s chronological age, has been studied extensively through predictive aging clocks. However, these clocks have limited accuracy in short time-scales. Deep learning approaches on imaging datasets of the eye have proven powerful for a variety of quantitative phenotype inference and provide an opportunity to explore organismal aging and tissue health. Methods Here we trained deep learning models on fundus images from the EyePacs dataset to predict individuals’ chronological age. These predictions lead to the concept of a retinal aging clock which we then employed for a series of downstream longitudinal analyses. The retinal aging clock was used to assess the predictive power of aging inference, termed eyeAge, on short time-scales using longitudinal fundus imaging data from a subset of patients. Additionally, the model was applied to a separate cohort from the UK Biobank to validate the model and perform a GWAS. The top candidate gene was then tested in a fly model of eye aging. Findings EyeAge was able to predict the age with a mean absolute error of 3.26 years, which is much less than other aging clocks. Additionally, eyeAge was highly independent of blood marker-based measures of biological age (e.g. “phenotypic age”), maintaining a hazard ratio of 1.026 even in the presence of phenotypic age. Longitudinal studies showed that the resulting models were able to predict individuals’ aging, in time-scales less than a year with 71% accuracy. Notably, we observed a significant individual-specific component to the prediction. This observation was confirmed with the identification of multiple GWAS hits in the independent UK Biobank cohort. The knockdown of the top hit, ALKAL2, which was previously shown to extend lifespan in flies, also slowed age-related decline in vision in flies. Interpretation In conclusion, predicted age from retinal images can be used as a biomarker of biological aging in a given individual independently from phenotypic age. This study demonstrates the utility of retinal aging clock for studying aging and age-related diseases and quantitatively measuring aging on very short time-scales, potentially opening avenues for quick and actionable evaluation of gero-protective therapeutics. View details
    ProtSeq: towards high-throughput, single-molecule protein sequencing via amino acid conversion into DNA barcodes
    Jessica Hong
    Michael Connor Gibbons
    Ali Bashir
    Diana Wu
    Shirley Shao
    Zachary Cutts
    Mariya Chavarha
    Ye Chen
    Lauren Schiff
    Mikelle Foster
    Victoria Church
    Llyke Ching
    Sara Ahadi
    Anna Hieu-Thao Le
    Alexander Tran
    Michelle Therese Dimon
    Phillip Jess
    iScience, vol. 25 (2022), pp. 32
    Preview abstract We demonstrate early progress toward constructing a high-throughput, single-molecule protein sequencing technology utilizing barcoded DNA aptamers (binders) to recognize terminal amino acids of peptides (targets) tethered on a next-generation sequencing chip. DNA binders deposit unique, amino acid identifying barcodes on the chip. The end goal is that over multiple binding cycles, a sequential chain of DNA barcodes will identify the amino acid sequence of a peptide. Toward this, we demonstrate successful target identification with two sets of target-binder pairs: DNA-DNA and Peptide-Protein. For DNA-DNA binding, we show assembly and sequencing of DNA barcodes over 6 consecutive binding cycles. Intriguingly, our computational simulation predicts that a small set of semi-selective DNA binders offers significant coverage of the human proteome. Toward this end, we introduce a binder discovery pipeline that ultimately could merge with the chip assay into a technology called ProtSeq, for future high-throughput, single-molecule protein sequencing. View details
    Machine learning guided aptamer discovery
    Ali Bashir
    Geoff Davis
    Michelle Therese Dimon
    Qin Yang
    Scott Ferguson
    Zan Armstrong
    Nature Communications (2021)
    Preview abstract Aptamers are discovered by searching a large library for sequences with desirable binding properties. These libraries, however, are physically constrained to a fraction of the theoretical sequence space and limited to sampling strategies that are easy to scale. Integrating machine learning could enable identification of high-performing aptamers across this unexplored fitness landscape. We employed particle display (PD) to partition aptamers by affinity and trained neural network models to improve physically-derived aptamers and predict affinity in silico. These predictions were used to locally improve physically derived aptamers as well as identify completely novel, high-affinity aptamers de novo. We experimentally validated the predictions, improving aptamer candidate designs at a rate 10-fold higher than random perturbation, and generating novel aptamers at a rate 448-fold higher than PD alone. We characterized the explanatory power of the models globally and locally and showed successful sequence truncation while maintaining affinity. This work combines machine learning and physical discovery, uses principles that are widely applicable to other display technologies, and provides a path forward for better diagnostic and therapeutic agents. View details
    Discovery of complex oxides via automated experiments and data science
    Joel A Haber
    Zan Armstrong
    Kevin Kan
    Lan Zhou
    Matthias H Richter
    Christopher Roat
    Nicholas Wagner
    Patrick Francis Riley
    John M Gregoire
    Proceedings of the Natural Academy of Sciences (2021)
    Preview abstract The quest to identify materials with tailored properties is increasingly expanding into high-order composition spaces, where materials discovery efforts have been met with the dual challenges of a combinatorial explosion in the number of candidate materials and a lack of predictive computation to guide experiments. The traditional approach to predictive materials science involves establishing a model that maps composition and structure to properties. We explore an inverse approach wherein a data science workflow uses high throughput measurements of optical properties to identify the composition spaces with interesting materials science. By identifying composition regions whose optical trends cannot be explained by trivial phase behavior, the data science pipeline identifies candidate combinations of elements that form 3-cation metal oxide phases. The identification of such novel phase behavior elevates the measurement of optical properties to the discovery of materials with complex phase-dependent properties. This conceptual workflow is illustrated with Co-Ta-Sn oxides wherein a new rutile alloy is discovered via data science guidance from the high throughput optical characterization. The composition-tuned properties of the rutile oxide alloys include transparency, catalytic activity, and stability in strong acid electrolytes. In addition to the unprecedented mapping of optical properties in 108 unique 3-cation oxide composition spaces, we present a critical discussion of coupling data validation to experiment design to generate a reliable end-to-end high throughput workflow for accelerating scientific discovery. View details
    Preview abstract We present IDEA (the Induction Dynamics gene Expression Atlas), a dataset constructed by independently inducing hundreds of transcription factors (TFs) and measuring timecourses of the resulting gene expression responses in budding yeast. Each experiment captures a regulatory cascade connecting a single induced regulator to the genes it causally regulates. We discuss the regulatory cascade of a single TF, Aft1, in detail; however, IDEA contains > 200 TF induction experiments with 20 million individual observations and 100,000 signal‐containing dynamic responses. As an application of IDEA, we integrate all timecourses into a whole‐cell transcriptional model, which is used to predict and validate multiple new and underappreciated transcriptional regulators. We also find that the magnitudes of coefficients in this model are predictive of genetic interaction profile similarities. In addition to being a resource for exploring regulatory connectivity between TFs and their target genes, our modeling approach shows that combining rapid perturbations of individual genes with genome‐scale time‐series measurements is an effective strategy for elucidating gene regulatory networks. View details
    It's easy to fool yourself: Case studies on identifying bias and confounding in bio-medical datasets
    Arunachalam Narayanaswamy
    Scott Lipnick
    Nina Makhortova
    James Hawrot
    Christine Marques
    Joao Pereira
    Lee Rubin
    Brian Wainger,
    NeurIPS LMRL workshop 2019 (2019)
    Preview abstract Confounding variables are a well known source of nuisance in biomedical studies. They present an even greater challenge when we combine them with black-box machine learning techniques that operate on raw data. This work presents two case studies. In one, we discovered biases arising from systematic errors in the data generation process. In the other, we found a spurious source of signal unrelated to the prediction task at hand. In both cases, our prediction models performed well but under careful examination hidden confounders and biases were revealed. These are cautionary tales on the limits of using machine learning techniques on raw data from scientific experiments. View details
    Applying Deep Neural Network Analysis to High-Content Image-Based Assays
    Scott L. Lipnick
    Nina R. Makhortova
    Minjie Fan
    Zan Armstrong
    Thorsten M. Schlaeger
    Liyong Deng
    Wendy K. Chung
    Liadan O'Callaghan
    Dosh Whye
    Jon Hazard
    Arunachalam Narayanaswamy
    D. Michael Ando
    Lee L. Rubin
    SLAS DISCOVERY: Advancing Life Sciences R\&D, vol. 0 (2019), pp. 2472555219857715
    Preview abstract The etiological underpinnings of many CNS disorders are not well understood. This is likely due to the fact that individual diseases aggregate numerous pathological subtypes, each associated with a complex landscape of genetic risk factors. To overcome these challenges, researchers are integrating novel data types from numerous patients, including imaging studies capturing broadly applicable features from patient-derived materials. These datasets, when combined with machine learning, potentially hold the power to elucidate the subtle patterns that stratify patients by shared pathology. In this study, we interrogated whether high-content imaging of primary skin fibroblasts, using the Cell Painting method, could reveal disease-relevant information among patients. First, we showed that technical features such as batch/plate type, plate, and location within a plate lead to detectable nuisance signals, as revealed by a pre-trained deep neural network and analysis with deep image embeddings. Using a plate design and image acquisition strategy that accounts for these variables, we performed a pilot study with 12 healthy controls and 12 subjects affected by the severe genetic neurological disorder spinal muscular atrophy (SMA), and evaluated whether a convolutional neural network (CNN) generated using a subset of the cells could distinguish disease states on cells from the remaining unseen control–SMA pair. Our results indicate that these two populations could effectively be differentiated from one another and that model selectivity is insensitive to batch/plate type. One caveat is that the samples were also largely separated by source. These findings lay a foundation for how to conduct future studies exploring diseases with more complex genetic contributions and unknown subtypes. View details
    Assessing microscope image focus quality with deep learning
    D. Michael Ando
    Mariya Barch
    Arunachalam Narayanaswamy
    Eric Christiansen
    Chris Roat
    Jane Hung
    Curtis T. Rueden
    Asim Shankar
    Steven Finkbeiner
    BMC Bioinformatics, vol. 19 (2018), pp. 77
    Preview abstract Background: Large image datasets acquired on automated microscopes typically have some fraction of low quality, out-of-focus images, despite the use of hardware autofocus systems. Identification of these images using automated image analysis with high accuracy is important for obtaining a clean, unbiased image dataset. Complicating this task is the fact that image focus quality is only well-defined in foreground regions of images, and as a result, most previous approaches only enable a computation of the relative difference in quality between two or more images, rather than an absolute measure of quality. Results: We present a deep neural network model capable of predicting an absolute measure of image focus on a single image in isolation, without any user-specified parameters. The model operates at the image-patch level, and also outputs a measure of prediction certainty, enabling interpretable predictions. The model was trained on only 384 in-focus Hoechst (nuclei) stain images of U2OS cells, which were synthetically defocused to one of 11 absolute defocus levels during training. The trained model can generalize on previously unseen real Hoechst stain images, identifying the absolute image focus to within one defocus level (approximately 3 pixel blur diameter difference) with 95% accuracy. On a simpler binary in/out-of-focus classification task, the trained model outperforms previous approaches on both Hoechst and Phalloidin (actin) stain images (F-scores of 0.89 and 0.86, respectively over 0.84 and 0.83), despite only having been presented Hoechst stain images during training. Lastly, we observe qualitatively that the model generalizes to two additional stains, Hoechst and Tubulin, of an unseen cell type (Human MCF-7) acquired on a different instrument. Conclusions: Our deep neural network enables classification of out-of-focus microscope images with both higher accuracy and greater precision than previous approaches via interpretable patch-level focus and certainty predictions. The use of synthetically defocused images precludes the need for a manually annotated training dataset. The model also generalizes to different image and cell types. The framework for model training and image prediction is available as a free software library and the pre-trained model is available for immediate use in Fiji (ImageJ) and CellProfiler. View details
    In Silico Labeling: Predicting Fluorescent Labels in Unlabeled Images
    Eric Christiansen
    Mike Ando
    Ashkan Javaherian
    Gaia Skibinski
    Scott Lipnick
    Elliot Mount
    Alison O'Neil
    Kevan Shah
    Alicia K. Lee
    Piyush Goyal
    Liam Fedus
    Andre Esteva
    Lee Rubin
    Steven Finkbeiner
    Cell (2018)
    Preview abstract Imaging is a central method in life sciences, and the drive to extract information from microscopy approaches has led to methods to fluorescently label specific cellular constituents. However, the specificity of fluorescent labels varies, labeling can confound biological measurements, and spectral overlap limits the number of labels to a few that can be resolved simultaneously. Here, we developed a deep learning computational approach called “in silico labeling (ISL)” that reliably infers information from unlabeled biological samples that would normally require invasive labeling. ISL predicts different labels in multiple cell types from independent laboratories. It makes cell type predictions by integrating in silico labels, and is not limited by spectral overlap. The network learned generalized features, enabling it to solve new problems with small training datasets. Thus, ISL provides biological insights from images of unlabeled samples for negligible additional cost that would be undesirable or impossible to measure directly. View details
    Preview abstract Image-based screening is a powerful technique to reveal how chemical, genetic, and environmental perturbations affect cellular state. Its potential is restricted by the current analysis algorithms that target a small number of cellular phenotypes and rely on expert-engineered image features. Newer algorithms that learn how to represent an image are limited by the small amount of labeled data for ground-truth, a common problem for scientific projects. We demonstrate a sensitive and robust method for distinguishing cellular phenotypes that requires no additional ground-truth data or training. It achieves state-of-the-art performance classifying drugs by similar molecular mechanism, using a Deep Metric Network that has been pre-trained on consumer images and a transformation that improves sensitivity to biological variation. However, our method is not limited to classification into predefined categories. It provides a continuous measure of the similarity between cellular phenotypes that can also detect subtle differences such as from increasing dose. The rich, biologically-meaningful image representation that our method provides can help therapy development by supporting high-throughput investigations, even exploratory ones, with more sophisticated and disease-relevant models. View details
    Molecular graph convolutions: moving beyond fingerprints
    Steven Kearnes
    Vijay Pande
    Patrick Riley
    Journal of Computer-Aided Molecular Design (2016), pp. 1-14
    Preview abstract Molecular “fingerprints” encoding structural information are the workhorse of cheminformatics and machine learning in drug discovery applications. However, fingerprint representations necessarily emphasize particular aspects of the molecular structure while ignoring others, rather than allowing the model to make data-driven decisions. We describe molecular graph convolutions, a machine learning architecture for learning from undirected graphs, specifically small molecules. Graph convolutions use a simple encoding of the molecular graph—atoms, bonds, distances, etc.—which allows the model to take greater advantage of information in the graph structure. Although graph convolutions do not outperform all fingerprint-based methods, they (along with other graph-based methods) represent a new paradigm in ligand-based virtual screening with exciting opportunities for future improvement. View details
    No Results Found