Kevin McCloskey
Research Areas
Authored Publications
Sort By
A scalable system to measure contrail formation on a per-flight basis
Erica Brand
Sebastian Eastham
Carl Elkin
Thomas Dean
Zebediah Engberg
Ulrike Hager
Joe Ng
Dinesh Sanekommu
Tharun Sankar
Marc Shapiro
Environmental Research Communications (2024)
Preview abstract
In this work we describe a scalable, automated system to determine from satellite data whether a given flight has made a persistent contrail.
The system works by comparing flight segments to contrails detected by a computer vision algorithm running on images from the GOES-16 Advanced Baseline Imager. We develop a `flight matching' algorithm and use it to label each flight segment as a `match' or `non-match'. We perform this analysis on 1.6 million flight segments and compare these labels to existing contrail prediction methods based on weather forecast data. The result is an analysis of which flights make persistent contrails several orders of magnitude larger than any previous work. We find that current contrail prediction models fail to correctly predict whether we will match a contrail in many cases.
View details
The effect of uncertainty in humidity and model parameters on the prediction of contrail energy forcing
Marc Shapiro
Zebediah Engberg
Tharun Sankar
Marc E.J. Stettler
Roger Teoh
Ulrich Schumann
Susanne Rohs
Erica Brand
Environmental Research Communications, 6 (2024), pp. 095015
Preview abstract
Previous work has shown that while the net effect of aircraft condensation trails (contrails) on the climate is warming, the exact magnitude of the energy forcing per meter of contrail remains uncertain. In this paper, we explore the skill of a Lagrangian contrail model (CoCiP) in identifying flight segments with high contrail energy forcing. We find that skill is greater than climatological predictions alone, even accounting for uncertainty in weather fields and model parameters. We estimate the uncertainty due to humidity by using the ensemble ERA5 weather reanalysis from the European Centre for Medium-Range Weather Forecasts (ECMWF) as Monte Carlo inputs to CoCiP. We unbias and correct under-dispersion on the ERA5 humidity data by forcing a match to the distribution of in situ humidity measurements taken at cruising altitude. We take CoCiP energy forcing estimates calculated using one of the ensemble members as a proxy for ground truth, and report the skill of CoCiP in identifying segments with large positive proxy energy forcing. We further estimate the uncertainty due to model parameters in CoCiP by performing Monte Carlo simulations with CoCiP model parameters drawn from uncertainty distributions consistent with the literature. When CoCiP outputs are averaged over seasons to form climatological predictions, the skill in predicting the proxy is 44%, while the skill of per-flight CoCiP outputs is 84%. If these results carry over to the true (unknown) contrail EF, they indicate that per-flight energy forcing predictions can reduce the number of potential contrail avoidance route adjustments by 2x, hence reducing both the cost and fuel impact of contrail avoidance.
View details
Contrail Detection on GOES-16 ABI with the OpenContrails Dataset
Joe Ng
Jian Cui
Vincent Rudolf Meijer
Erica Brand
IEEE Transactions on Geoscience and Remote Sensing (2023)
Preview abstract
Contrails (condensation trails) are line-shaped ice clouds caused by aircraft and are a substantial contributor to aviation-induced climate change. Contrail avoidance is potentially an inexpensive way to significantly reduce the climate impact of aviation. An automated contrail detection system is an essential tool to develop and evaluate contrail avoidance systems. In this article, we present a human-labeled dataset named OpenContrails to train and evaluate contrail detection models based on GOES-16 Advanced Baseline Imager (ABI) data. We propose and evaluate a contrail detection model that incorporates temporal context for improved detection accuracy. The human labeled dataset and the contrail detection outputs are publicly available on Google Cloud Storage at gs://goes_contrails_dataset .
View details
Estimates of broadband upwelling irradiance from GOES-16 ABI
Sixing Chen
Vincent Rudolf Meijer
Joe Ng
Geoff Davis
Carl Elkin
Remote Sensing of Environment, 285 (2023)
Preview abstract
Satellite-derived estimates of the Earth’s radiation budget are crucial for understanding and predicting the weather and climate. However, existing satellite products measuring broadband outgoing longwave radiation (OLR) and reflected shortwave radiation (RSR) have spatio-temporal resolutions that are too coarse to evaluate important radiative forcers like aircraft condensation trails. We present a neural network which estimates OLR and RSR based on narrowband radiances, using collocated Cloud and Earth’s Radiant Energy System (CERES) and GOES-16 Advanced Baseline Imager (ABI) data. The resulting estimates feature strong agreement with the CERES data products (R^2 = 0.977 for OLR and 0.974 for RSR on CERES Level 2 footprints), and we provide open access to the collocated satellite data and model outputs on all available GOES-16 ABI data for the 4 years from 2018–2021.
View details
A Machine-Learning Based Objective Measure for ALS disease progression
Fernando Viera
Alan S Premasiri
Maeve McNally
Steven Perrin
npj Digital Medicine (2022)
Preview abstract
Amyotrophic Lateral Sclerosis (ALS) disease progression is usually measured using the subjective, questionnaire-based revised ALS Functional Rating Scale (ALSFRS-R). A purely objective measure for tracking disease progression would be a powerful tool for evaluating real-world drug effectiveness, efficacy in clinical trials, as well as identifying participants for cohort studies. Here we develop a machine learning based objective measure for ALS disease progression, based on voice samples and accelerometer measurements. The ALS Therapy Development Institute (ALS-TDI) collected a unique dataset of voice and accelerometer samples from consented individuals - 584 people living with ALS over four years. Participants carried out prescribed speaking and limb-based tasks. 542 participants contributed 5814 voice recordings, and 350 contributed 13009 accelerometer samples, while simultaneously measuring ALSFRS-R. Using the data from 475 participants, we trained machine learning (ML) models, correlating voice with bulbar-related FRS scores and accelerometer with limb related scores. On the test set (n=109 participants) the voice models achieved an AUC of 0.86 (95% CI, 0.847-0.884) , whereas the accelerometer models achieved a median AUC of 0.73 . We used the models and self-reported ALSFRS-R scores to evaluate the real-world effects of edaravone, a drug recently approved for use in ALS, on 54 test participants. In the test cohort, the digital data input into the ML models produced objective measures of progression rates over the duration of the study that were consistent with self-reported scores. This demonstrates the value of these tools for assessing both disease progression and potentially drug effects. In this instance, outcomes from edaravone treatment, both self-reported and digital-ML, resulted in highly variable outcomes from person to person.
View details
A human-labeled Landsat contrails dataset
Vincent Rudolf Meijer
Erica Wickstrom Brand
Carl Elkin
ICML workshop on Climate Change 2021 (2021)
Preview abstract
Contrails (condensation trails) are the ice clouds that trail behind aircraft as they fly through cold and moist regions of the atmosphere. Avoiding these regions could potentially be an inexpensive way to reduce over half of aviation's impact on global warming. Development and evaluation of these avoidance strategies greatly benefits from the ability to detect contrails on satellite imagery. Since little to no public data is available to develop such contrail detectors, we construct and release a dataset of several thousand Landsat-8 scenes with pixel-level annotations of contrails. The dataset will continue to grow, but currently contains 3431 scenes (of which 47\% have at least one contrail) representing 800+ person-hours of labeling time.
View details
Machine learning on DNA-encoded libraries: A new paradigm for hit-finding
Eric A. Sigel
Steven Kearnes
Ling Xue
Xia Tian
Dennis Moccia
Diana Gikunju
Sana Bazzaz
Betty Chan
Matthew A. Clark
John W. Cuozzo
Marie-Aude Guié
John P. Guilinger
Christelle Huguet
Christopher D. Hupp
Anthony D. Keefe
Christopher J. Mulhern
Ying Zhang
Patrick Francis Riley
Journal of Medicinal Chemistry (2020)
Preview abstract
DNA-encoded small molecule libraries (DELs) have enabled discovery of novel inhibitors for many distinct protein targets of therapeutic value. We demonstrate a new approach applying machine learning to DEL selection data by identifying active molecules from large libraries of commercial and easily synthesizable compounds. We train models using only DEL selection data and apply automated or automatable filters to the predictions. We perform a large prospective study (∼2000 compounds) across three diverse protein targets: sEH (a hydrolase), ERα (a nuclear receptor), and c-KIT (a kinase). The approach is effective, with an overall hit rate of ∼30% at 30 μM and discovery of potent compounds (IC50 < 10 nM) for every target. The system makes useful predictions even for molecules dissimilar to the original DEL, and the compounds identified are diverse, predominantly drug-like, and different from known ligands. This work demonstrates a powerful new approach to hit-finding.
View details
Evaluating Attribution for Graph Neural Networks
Alexander B Wiltschko
Benjamin Sanchez-Lengeling
Brian Lee
Jennifer Wei
Wesley Qian
Yiliu Wang
Advances in Neural Information Processing Systems 33 (2020)
Preview abstract
Interpretability of machine learning models is critical to scientific understanding, AI safety, and debugging. Attribution is one approach to interpretability, which highlights input dimensions that are influential to a neural network’s prediction. Evaluation of these methods is largely qualitative for image and text models, because acquiring ground truth attributions requires expensive and unreliable human judgment. Attribution has been comparatively understudied for graph neural networks (GNNs), a model class of growing importance that makes predictions on arbitrarily-sized graphs. Graph-valued data offer an opportunity to quantitatively benchmark attribution methods, because challenging synthetic graph problems have computable ground-truth attributions. In this work we adapt commonly-used attribution methods for GNNs and quantitatively evaluate them using the axes of attribution accuracy, stability, faithfulness and consistency. We make concrete recommendations for which attribution methods to use, and provide the data and code for our benchmarking suite. Rigorous and open source benchmarking of attribution methods in graphs could enable new methods development and broader use of attribution in real-world ML tasks.
View details
Preview abstract
Materials design enables technologies critical to humanity, including combating climate change with solar cells and batteries. Many properties of a material are determined by its atomic crystal structure. However, prediction of the atomic crystal structure for a given material's chemical formula is a long-standing grand challenge that remains a barrier in materials design. We investigate a data-driven approach to accelerating ab initio random structure search (AIRSS), a state-of-the-art method for crystal structure search. We build a novel dataset of random structure relaxations of Li-Si battery anode materials using high-throughput density functional theory calculations. We train graph neural networks to simulate relaxations of random structures. Our model is able to find an experimentally verified structure of Li15Si4 it was not trained on, and has potential for orders of magnitude speedup over AIRSS when searching large unit cells and searching over multiple chemical stoichiometries. Surprisingly, we find that data augmentation of adding Gaussian noise improves both the accuracy and out of domain generalization of our models.
View details
Using attribution to decode binding mechanism in neural network models for chemistry
Ankur Taly
Federico Monti
Proceedings of the National Academy of Sciences (2019), pp. 201820657
Preview abstract
Deep neural networks have achieved state of the art accuracy at classifying molecules with respect to whether they bind to specific protein targets. A key breakthrough would occur if these models could reveal the fragment pharmacophores that are causally involved in binding. Extracting chemical details of binding from the networks could potentially lead to scientific discoveries about the mechanisms of drug actions. But doing so requires shining light into the black box that is the trained neural network model, a task that has proved difficult across many domains. Here we show how the binding mechanism learned by deep neural network models can be interrogated, using a recently described attribution method. We first work with carefully constructed synthetic datasets, in which the 'fragment logic' of binding is fully known. We find that networks that achieve perfect accuracy on held out test datasets still learn spurious correlations due to biases in the datasets, and we are able to exploit this non-robustness to construct adversarial examples that fool the model. The dataset bias makes these models unreliable for accurately revealing information about the mechanisms of protein-ligand binding. In light of our findings, we prescribe a test that checks for dataset bias given a hypothesis. If the test fails, it indicates that either the model must be simplified or regularized and/or that the training dataset requires augmentation.
View details