Kevin McCloskey

Kevin McCloskey

Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    A scalable system to measure contrail formation on a per-flight basis
    Erica Brand
    Sebastian Eastham
    Carl Elkin
    Thomas Dean
    Zebediah Engberg
    Ulrike Hager
    Joe Ng
    Dinesh Sanekommu
    Tharun Sankar
    Marc Shapiro
    Environmental Research Communications (2024)
    Preview abstract In this work we describe a scalable, automated system to determine from satellite data whether a given flight has made a persistent contrail. The system works by comparing flight segments to contrails detected by a computer vision algorithm running on images from the GOES-16 Advanced Baseline Imager. We develop a `flight matching' algorithm and use it to label each flight segment as a `match' or `non-match'. We perform this analysis on 1.6 million flight segments and compare these labels to existing contrail prediction methods based on weather forecast data. The result is an analysis of which flights make persistent contrails several orders of magnitude larger than any previous work. We find that current contrail prediction models fail to correctly predict whether we will match a contrail in many cases. View details
    The effect of uncertainty in humidity and model parameters on the prediction of contrail energy forcing
    Marc Shapiro
    Zebediah Engberg
    Tharun Sankar
    Marc E.J. Stettler
    Roger Teoh
    Ulrich Schumann
    Susanne Rohs
    Erica Brand
    Environmental Research Communications, 6 (2024), pp. 095015
    Preview abstract Previous work has shown that while the net effect of aircraft condensation trails (contrails) on the climate is warming, the exact magnitude of the energy forcing per meter of contrail remains uncertain. In this paper, we explore the skill of a Lagrangian contrail model (CoCiP) in identifying flight segments with high contrail energy forcing. We find that skill is greater than climatological predictions alone, even accounting for uncertainty in weather fields and model parameters. We estimate the uncertainty due to humidity by using the ensemble ERA5 weather reanalysis from the European Centre for Medium-Range Weather Forecasts (ECMWF) as Monte Carlo inputs to CoCiP. We unbias and correct under-dispersion on the ERA5 humidity data by forcing a match to the distribution of in situ humidity measurements taken at cruising altitude. We take CoCiP energy forcing estimates calculated using one of the ensemble members as a proxy for ground truth, and report the skill of CoCiP in identifying segments with large positive proxy energy forcing. We further estimate the uncertainty due to model parameters in CoCiP by performing Monte Carlo simulations with CoCiP model parameters drawn from uncertainty distributions consistent with the literature. When CoCiP outputs are averaged over seasons to form climatological predictions, the skill in predicting the proxy is 44%, while the skill of per-flight CoCiP outputs is 84%. If these results carry over to the true (unknown) contrail EF, they indicate that per-flight energy forcing predictions can reduce the number of potential contrail avoidance route adjustments by 2x, hence reducing both the cost and fuel impact of contrail avoidance. View details
    Contrail Detection on GOES-16 ABI with the OpenContrails Dataset
    Joe Ng
    Jian Cui
    Vincent Rudolf Meijer
    Erica Brand
    IEEE Transactions on Geoscience and Remote Sensing (2023)
    Preview abstract Contrails (condensation trails) are line-shaped ice clouds caused by aircraft and are a substantial contributor to aviation-induced climate change. Contrail avoidance is potentially an inexpensive way to significantly reduce the climate impact of aviation. An automated contrail detection system is an essential tool to develop and evaluate contrail avoidance systems. In this article, we present a human-labeled dataset named OpenContrails to train and evaluate contrail detection models based on GOES-16 Advanced Baseline Imager (ABI) data. We propose and evaluate a contrail detection model that incorporates temporal context for improved detection accuracy. The human labeled dataset and the contrail detection outputs are publicly available on Google Cloud Storage at gs://goes_contrails_dataset . View details
    Estimates of broadband upwelling irradiance from GOES-16 ABI
    Sixing Chen
    Vincent Rudolf Meijer
    Joe Ng
    Geoff Davis
    Carl Elkin
    Remote Sensing of Environment, 285 (2023)
    Preview abstract Satellite-derived estimates of the Earth’s radiation budget are crucial for understanding and predicting the weather and climate. However, existing satellite products measuring broadband outgoing longwave radiation (OLR) and reflected shortwave radiation (RSR) have spatio-temporal resolutions that are too coarse to evaluate important radiative forcers like aircraft condensation trails. We present a neural network which estimates OLR and RSR based on narrowband radiances, using collocated Cloud and Earth’s Radiant Energy System (CERES) and GOES-16 Advanced Baseline Imager (ABI) data. The resulting estimates feature strong agreement with the CERES data products (R^2 = 0.977 for OLR and 0.974 for RSR on CERES Level 2 footprints), and we provide open access to the collocated satellite data and model outputs on all available GOES-16 ABI data for the 4 years from 2018–2021. View details
    Preview abstract Amyotrophic Lateral Sclerosis (ALS) disease progression is usually measured using the subjective, questionnaire-based revised ALS Functional Rating Scale (ALSFRS-R). A purely objective measure for tracking disease progression would be a powerful tool for evaluating real-world drug effectiveness, efficacy in clinical trials, as well as identifying participants for cohort studies. Here we develop a machine learning based objective measure for ALS disease progression, based on voice samples and accelerometer measurements. The ALS Therapy Development Institute (ALS-TDI) collected a unique dataset of voice and accelerometer samples from consented individuals - 584 people living with ALS over four years. Participants carried out prescribed speaking and limb-based tasks. 542 participants contributed 5814 voice recordings, and 350 contributed 13009 accelerometer samples, while simultaneously measuring ALSFRS-R. Using the data from 475 participants, we trained machine learning (ML) models, correlating voice with bulbar-related FRS scores and accelerometer with limb related scores. On the test set (n=109 participants) the voice models achieved an AUC of 0.86 (95% CI, 0.847-0.884) , whereas the accelerometer models achieved a median AUC of 0.73 . We used the models and self-reported ALSFRS-R scores to evaluate the real-world effects of edaravone, a drug recently approved for use in ALS, on 54 test participants. In the test cohort, the digital data input into the ML models produced objective measures of progression rates over the duration of the study that were consistent with self-reported scores. This demonstrates the value of these tools for assessing both disease progression and potentially drug effects. In this instance, outcomes from edaravone treatment, both self-reported and digital-ML, resulted in highly variable outcomes from person to person. View details
    Preview abstract Contrails (condensation trails) are the ice clouds that trail behind aircraft as they fly through cold and moist regions of the atmosphere. Avoiding these regions could potentially be an inexpensive way to reduce over half of aviation's impact on global warming. Development and evaluation of these avoidance strategies greatly benefits from the ability to detect contrails on satellite imagery. Since little to no public data is available to develop such contrail detectors, we construct and release a dataset of several thousand Landsat-8 scenes with pixel-level annotations of contrails. The dataset will continue to grow, but currently contains 3431 scenes (of which 47\% have at least one contrail) representing 800+ person-hours of labeling time. View details
    Machine learning on DNA-encoded libraries: A new paradigm for hit-finding
    Eric A. Sigel
    Steven Kearnes
    Ling Xue
    Xia Tian
    Dennis Moccia
    Diana Gikunju
    Sana Bazzaz
    Betty Chan
    Matthew A. Clark
    John W. Cuozzo
    Marie-Aude Guié
    John P. Guilinger
    Christelle Huguet
    Christopher D. Hupp
    Anthony D. Keefe
    Christopher J. Mulhern
    Ying Zhang
    Patrick Francis Riley
    Journal of Medicinal Chemistry (2020)
    Preview abstract DNA-encoded small molecule libraries (DELs) have enabled discovery of novel inhibitors for many distinct protein targets of therapeutic value. We demonstrate a new approach applying machine learning to DEL selection data by identifying active molecules from large libraries of commercial and easily synthesizable compounds. We train models using only DEL selection data and apply automated or automatable filters to the predictions. We perform a large prospective study (∼2000 compounds) across three diverse protein targets: sEH (a hydrolase), ERα (a nuclear receptor), and c-KIT (a kinase). The approach is effective, with an overall hit rate of ∼30% at 30 μM and discovery of potent compounds (IC50 < 10 nM) for every target. The system makes useful predictions even for molecules dissimilar to the original DEL, and the compounds identified are diverse, predominantly drug-like, and different from known ligands. This work demonstrates a powerful new approach to hit-finding. View details
    Evaluating Attribution for Graph Neural Networks
    Alexander B Wiltschko
    Benjamin Sanchez-Lengeling
    Brian Lee
    Jennifer Wei
    Wesley Qian
    Yiliu Wang
    Advances in Neural Information Processing Systems 33 (2020)
    Preview abstract Interpretability of machine learning models is critical to scientific understanding, AI safety, and debugging. Attribution is one approach to interpretability, which highlights input dimensions that are influential to a neural network’s prediction. Evaluation of these methods is largely qualitative for image and text models, because acquiring ground truth attributions requires expensive and unreliable human judgment. Attribution has been comparatively understudied for graph neural networks (GNNs), a model class of growing importance that makes predictions on arbitrarily-sized graphs. Graph-valued data offer an opportunity to quantitatively benchmark attribution methods, because challenging synthetic graph problems have computable ground-truth attributions. In this work we adapt commonly-used attribution methods for GNNs and quantitatively evaluate them using the axes of attribution accuracy, stability, faithfulness and consistency. We make concrete recommendations for which attribution methods to use, and provide the data and code for our benchmarking suite. Rigorous and open source benchmarking of attribution methods in graphs could enable new methods development and broader use of attribution in real-world ML tasks. View details
    Preview abstract Materials design enables technologies critical to humanity, including combating climate change with solar cells and batteries. Many properties of a material are determined by its atomic crystal structure. However, prediction of the atomic crystal structure for a given material's chemical formula is a long-standing grand challenge that remains a barrier in materials design. We investigate a data-driven approach to accelerating ab initio random structure search (AIRSS), a state-of-the-art method for crystal structure search. We build a novel dataset of random structure relaxations of Li-Si battery anode materials using high-throughput density functional theory calculations. We train graph neural networks to simulate relaxations of random structures. Our model is able to find an experimentally verified structure of Li15Si4 it was not trained on, and has potential for orders of magnitude speedup over AIRSS when searching large unit cells and searching over multiple chemical stoichiometries. Surprisingly, we find that data augmentation of adding Gaussian noise improves both the accuracy and out of domain generalization of our models. View details
    Using attribution to decode binding mechanism in neural network models for chemistry
    Ankur Taly
    Federico Monti
    Proceedings of the National Academy of Sciences (2019), pp. 201820657
    Preview abstract Deep neural networks have achieved state of the art accuracy at classifying molecules with respect to whether they bind to specific protein targets. A key breakthrough would occur if these models could reveal the fragment pharmacophores that are causally involved in binding. Extracting chemical details of binding from the networks could potentially lead to scientific discoveries about the mechanisms of drug actions. But doing so requires shining light into the black box that is the trained neural network model, a task that has proved difficult across many domains. Here we show how the binding mechanism learned by deep neural network models can be interrogated, using a recently described attribution method. We first work with carefully constructed synthetic datasets, in which the 'fragment logic' of binding is fully known. We find that networks that achieve perfect accuracy on held out test datasets still learn spurious correlations due to biases in the datasets, and we are able to exploit this non-robustness to construct adversarial examples that fool the model. The dataset bias makes these models unreliable for accurately revealing information about the mechanisms of protein-ligand binding. In light of our findings, we prescribe a test that checks for dataset bias given a hypothesis. If the test fails, it indicates that either the model must be simplified or regularized and/or that the training dataset requires augmentation. View details