Jump to Content
Anton Geraschenko

Anton Geraschenko

Anton is a software engineer on the Google Accelerated Science team. His previous work at Google includes natural language understanding and image search. His previous work outside of Google includes research and engineering in machine perception. He got his PhD in mathematics from UC Berkeley in 2011, after which he was a postdoc at Caltech. His academic work specialized in algebraic geometry, particularly the geometry of algebraic stacks. He founded mathoverflow.net in 2009, and served as its benevolent dictator for several years.
Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    It's easy to fool yourself: Case studies on identifying bias and confounding in bio-medical datasets
    Arunachalam Narayanaswamy
    Scott Lipnick
    Nina Makhortova
    James Hawrot
    Christine Marques
    Joao Pereira
    Lee Rubin
    Brian Wainger,
    NeurIPS LMRL workshop 2019 (2019)
    Preview abstract Confounding variables are a well known source of nuisance in biomedical studies. They present an even greater challenge when we combine them with black-box machine learning techniques that operate on raw data. This work presents two case studies. In one, we discovered biases arising from systematic errors in the data generation process. In the other, we found a spurious source of signal unrelated to the prediction task at hand. In both cases, our prediction models performed well but under careful examination hidden confounders and biases were revealed. These are cautionary tales on the limits of using machine learning techniques on raw data from scientific experiments. View details
    Applying Deep Neural Network Analysis to High-Content Image-Based Assays
    Scott L. Lipnick
    Nina R. Makhortova
    Minjie Fan
    Zan Armstrong
    Thorsten M. Schlaeger
    Liyong Deng
    Wendy K. Chung
    Liadan O'Callaghan
    Dosh Whye
    Jon Hazard
    Arunachalam Narayanaswamy
    D. Michael Ando
    Lee L. Rubin
    SLAS DISCOVERY: Advancing Life Sciences R\&D, vol. 0 (2019), pp. 2472555219857715
    Preview abstract The etiological underpinnings of many CNS disorders are not well understood. This is likely due to the fact that individual diseases aggregate numerous pathological subtypes, each associated with a complex landscape of genetic risk factors. To overcome these challenges, researchers are integrating novel data types from numerous patients, including imaging studies capturing broadly applicable features from patient-derived materials. These datasets, when combined with machine learning, potentially hold the power to elucidate the subtle patterns that stratify patients by shared pathology. In this study, we interrogated whether high-content imaging of primary skin fibroblasts, using the Cell Painting method, could reveal disease-relevant information among patients. First, we showed that technical features such as batch/plate type, plate, and location within a plate lead to detectable nuisance signals, as revealed by a pre-trained deep neural network and analysis with deep image embeddings. Using a plate design and image acquisition strategy that accounts for these variables, we performed a pilot study with 12 healthy controls and 12 subjects affected by the severe genetic neurological disorder spinal muscular atrophy (SMA), and evaluated whether a convolutional neural network (CNN) generated using a subset of the cells could distinguish disease states on cells from the remaining unseen control–SMA pair. Our results indicate that these two populations could effectively be differentiated from one another and that model selectivity is insensitive to batch/plate type. One caveat is that the samples were also largely separated by source. These findings lay a foundation for how to conduct future studies exploring diseases with more complex genetic contributions and unknown subtypes. View details
    No Results Found