Jump to Content
Kevin McCloskey

Kevin McCloskey

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, desc
  • Year
  • Year, desc
    A scalable system to measure contrail formation on a per-flight basis
    Marc Shapiro
    Erica Brand
    Sebastian Eastham
    Carl Elkin
    Thomas Dean
    Zebediah Engberg
    Ulrike Hager
    Joe Ng
    Dinesh Sanekommu
    Tharun Sankar
    Environmental Research Communications (2024)
    Preview abstract In this work we describe a scalable, automated system to determine from satellite data whether a given flight has made a persistent contrail. The system works by comparing flight segments to contrails detected by a computer vision algorithm running on images from the GOES-16 Advanced Baseline Imager. We develop a `flight matching' algorithm and use it to label each flight segment as a `match' or `non-match'. We perform this analysis on 1.6 million flight segments and compare these labels to existing contrail prediction methods based on weather forecast data. The result is an analysis of which flights make persistent contrails several orders of magnitude larger than any previous work. We find that current contrail prediction models fail to correctly predict whether we will match a contrail in many cases. View details
    Estimates of broadband upwelling irradiance from GOES-16 ABI
    Sixing Chen
    Vincent Rudolf Meijer
    Joe Ng
    Geoff Davis
    Carl Elkin
    Remote Sensing of Environment, vol. 285 (2023)
    Preview abstract Satellite-derived estimates of the Earth’s radiation budget are crucial for understanding and predicting the weather and climate. However, existing satellite products measuring broadband outgoing longwave radiation (OLR) and reflected shortwave radiation (RSR) have spatio-temporal resolutions that are too coarse to evaluate important radiative forcers like aircraft condensation trails. We present a neural network which estimates OLR and RSR based on narrowband radiances, using collocated Cloud and Earth’s Radiant Energy System (CERES) and GOES-16 Advanced Baseline Imager (ABI) data. The resulting estimates feature strong agreement with the CERES data products (R^2 = 0.977 for OLR and 0.974 for RSR on CERES Level 2 footprints), and we provide open access to the collocated satellite data and model outputs on all available GOES-16 ABI data for the 4 years from 2018–2021. View details
    Contrail Detection on GOES-16 ABI with the OpenContrails Dataset
    Joe Ng
    Jian Cui
    Vincent Rudolf Meijer
    Erica Brand
    IEEE Transactions on Geoscience and Remote Sensing (2023)
    Preview abstract Contrails (condensation trails) are line-shaped ice clouds caused by aircraft and are a substantial contributor to aviation-induced climate change. Contrail avoidance is potentially an inexpensive way to significantly reduce the climate impact of aviation. An automated contrail detection system is an essential tool to develop and evaluate contrail avoidance systems. In this article, we present a human-labeled dataset named OpenContrails to train and evaluate contrail detection models based on GOES-16 Advanced Baseline Imager (ABI) data. We propose and evaluate a contrail detection model that incorporates temporal context for improved detection accuracy. The human labeled dataset and the contrail detection outputs are publicly available on Google Cloud Storage at gs://goes_contrails_dataset . View details
    Preview abstract Amyotrophic Lateral Sclerosis (ALS) disease progression is usually measured using the subjective, questionnaire-based revised ALS Functional Rating Scale (ALSFRS-R). A purely objective measure for tracking disease progression would be a powerful tool for evaluating real-world drug effectiveness, efficacy in clinical trials, as well as identifying participants for cohort studies. Here we develop a machine learning based objective measure for ALS disease progression, based on voice samples and accelerometer measurements. The ALS Therapy Development Institute (ALS-TDI) collected a unique dataset of voice and accelerometer samples from consented individuals - 584 people living with ALS over four years. Participants carried out prescribed speaking and limb-based tasks. 542 participants contributed 5814 voice recordings, and 350 contributed 13009 accelerometer samples, while simultaneously measuring ALSFRS-R. Using the data from 475 participants, we trained machine learning (ML) models, correlating voice with bulbar-related FRS scores and accelerometer with limb related scores. On the test set (n=109 participants) the voice models achieved an AUC of 0.86 (95% CI, 0.847-0.884) , whereas the accelerometer models achieved a median AUC of 0.73 . We used the models and self-reported ALSFRS-R scores to evaluate the real-world effects of edaravone, a drug recently approved for use in ALS, on 54 test participants. In the test cohort, the digital data input into the ML models produced objective measures of progression rates over the duration of the study that were consistent with self-reported scores. This demonstrates the value of these tools for assessing both disease progression and potentially drug effects. In this instance, outcomes from edaravone treatment, both self-reported and digital-ML, resulted in highly variable outcomes from person to person. View details
    Preview abstract Contrails (condensation trails) are the ice clouds that trail behind aircraft as they fly through cold and moist regions of the atmosphere. Avoiding these regions could potentially be an inexpensive way to reduce over half of aviation's impact on global warming. Development and evaluation of these avoidance strategies greatly benefits from the ability to detect contrails on satellite imagery. Since little to no public data is available to develop such contrail detectors, we construct and release a dataset of several thousand Landsat-8 scenes with pixel-level annotations of contrails. The dataset will continue to grow, but currently contains 3431 scenes (of which 47\% have at least one contrail) representing 800+ person-hours of labeling time. View details
    Preview abstract Materials design enables technologies critical to humanity, including combating climate change with solar cells and batteries. Many properties of a material are determined by its atomic crystal structure. However, prediction of the atomic crystal structure for a given material's chemical formula is a long-standing grand challenge that remains a barrier in materials design. We investigate a data-driven approach to accelerating ab initio random structure search (AIRSS), a state-of-the-art method for crystal structure search. We build a novel dataset of random structure relaxations of Li-Si battery anode materials using high-throughput density functional theory calculations. We train graph neural networks to simulate relaxations of random structures. Our model is able to find an experimentally verified structure of Li15Si4 it was not trained on, and has potential for orders of magnitude speedup over AIRSS when searching large unit cells and searching over multiple chemical stoichiometries. Surprisingly, we find that data augmentation of adding Gaussian noise improves both the accuracy and out of domain generalization of our models. View details
    Machine learning on DNA-encoded libraries: A new paradigm for hit-finding
    Eric A. Sigel
    Steven Kearnes
    Ling Xue
    Xia Tian
    Dennis Moccia
    Diana Gikunju
    Sana Bazzaz
    Betty Chan
    Matthew A. Clark
    John W. Cuozzo
    Marie-Aude Guié
    John P. Guilinger
    Christelle Huguet
    Christopher D. Hupp
    Anthony D. Keefe
    Christopher J. Mulhern
    Ying Zhang
    Patrick Francis Riley
    Journal of Medicinal Chemistry (2020)
    Preview abstract DNA-encoded small molecule libraries (DELs) have enabled discovery of novel inhibitors for many distinct protein targets of therapeutic value. We demonstrate a new approach applying machine learning to DEL selection data by identifying active molecules from large libraries of commercial and easily synthesizable compounds. We train models using only DEL selection data and apply automated or automatable filters to the predictions. We perform a large prospective study (∼2000 compounds) across three diverse protein targets: sEH (a hydrolase), ERα (a nuclear receptor), and c-KIT (a kinase). The approach is effective, with an overall hit rate of ∼30% at 30 μM and discovery of potent compounds (IC50 < 10 nM) for every target. The system makes useful predictions even for molecules dissimilar to the original DEL, and the compounds identified are diverse, predominantly drug-like, and different from known ligands. This work demonstrates a powerful new approach to hit-finding. View details
    Evaluating Attribution for Graph Neural Networks
    Alexander B Wiltschko
    Brian Lee
    Jennifer Wei
    Wesley Qian
    Yiliu Wang
    Advances in Neural Information Processing Systems 33 (2020)
    Preview abstract Interpretability of machine learning models is critical to scientific understanding, AI safety, and debugging. Attribution is one approach to interpretability, which highlights input dimensions that are influential to a neural network’s prediction. Evaluation of these methods is largely qualitative for image and text models, because acquiring ground truth attributions requires expensive and unreliable human judgment. Attribution has been comparatively understudied for graph neural networks (GNNs), a model class of growing importance that makes predictions on arbitrarily-sized graphs. Graph-valued data offer an opportunity to quantitatively benchmark attribution methods, because challenging synthetic graph problems have computable ground-truth attributions. In this work we adapt commonly-used attribution methods for GNNs and quantitatively evaluate them using the axes of attribution accuracy, stability, faithfulness and consistency. We make concrete recommendations for which attribution methods to use, and provide the data and code for our benchmarking suite. Rigorous and open source benchmarking of attribution methods in graphs could enable new methods development and broader use of attribution in real-world ML tasks. View details
    Using attribution to decode binding mechanism in neural network models for chemistry
    Ankur Taly
    Federico Monti
    Proceedings of the National Academy of Sciences (2019), pp. 201820657
    Preview abstract Deep neural networks have achieved state of the art accuracy at classifying molecules with respect to whether they bind to specific protein targets. A key breakthrough would occur if these models could reveal the fragment pharmacophores that are causally involved in binding. Extracting chemical details of binding from the networks could potentially lead to scientific discoveries about the mechanisms of drug actions. But doing so requires shining light into the black box that is the trained neural network model, a task that has proved difficult across many domains. Here we show how the binding mechanism learned by deep neural network models can be interrogated, using a recently described attribution method. We first work with carefully constructed synthetic datasets, in which the 'fragment logic' of binding is fully known. We find that networks that achieve perfect accuracy on held out test datasets still learn spurious correlations due to biases in the datasets, and we are able to exploit this non-robustness to construct adversarial examples that fool the model. The dataset bias makes these models unreliable for accurately revealing information about the mechanisms of protein-ligand binding. In light of our findings, we prescribe a test that checks for dataset bias given a hypothesis. If the test fails, it indicates that either the model must be simplified or regularized and/or that the training dataset requires augmentation. View details
    ML-driven search for zero-emissions ammonia production materials
    The International Conference on Machine Learning, Workshop on Climate Change (2019)
    Preview abstract Ammonia (NH3) production is an industrial process that consumes between 1-2% of global energy annually and is responsible for 2-3% of greenhouse gas emissions. Ammonia is primarily used for agricultural fertilizers, but it also conforms to the US DOE targets for hydrogen storage materials. Modern industrial facilities use the century-old Haber-Bosch process, whose energy usage and carbon emissions are strongly dominated by the use of methane as the combined energy source and hydrogen feedstock, not by the energy used to maintain elevated temperatures and pressures. Generating the hydrogen feedstock with renewable electricity through water electrolysis is an option that would allow retrofitting the billions of dollars of invested capital in Haber-Bosch production capacity. Economic viability is however strongly dependent on the relative regional prices of methane and renewable energy; renewables have been trending lower in cost but forecasting methane prices is difficult. Electrochemical ammonia production, which can use aqueous or steam H2O as its hydrogen source (first demonstrated ˜20 years ago) is a promising means of emissions-free ammonia production. Its viability is also linked to the relative price of renewable energy versus methane, but in principle it can be significantly more cost-effective than Haber-Bosch and also downscale to developing areas lacking ammonia transport infrastructure. However to date it has only been demonstrated at laboratory scales with yields and Faradaic efficiencies insufficient to be economically competitive. Promising machine-learning approaches to fix this are discussed. View details
    Molecular graph convolutions: moving beyond fingerprints
    Steven Kearnes
    Vijay Pande
    Patrick Riley
    Journal of Computer-Aided Molecular Design (2016), pp. 1-14
    Preview abstract Molecular “fingerprints” encoding structural information are the workhorse of cheminformatics and machine learning in drug discovery applications. However, fingerprint representations necessarily emphasize particular aspects of the molecular structure while ignoring others, rather than allowing the model to make data-driven decisions. We describe molecular graph convolutions, a machine learning architecture for learning from undirected graphs, specifically small molecules. Graph convolutions use a simple encoding of the molecular graph—atoms, bonds, distances, etc.—which allows the model to take greater advantage of information in the graph structure. Although graph convolutions do not outperform all fingerprint-based methods, they (along with other graph-based methods) represent a new paradigm in ligand-based virtual screening with exciting opportunities for future improvement. View details
    No Results Found