Jump to Content
Drew Bryant

Drew Bryant

Research Areas

Authored Publications
Google Publications
Other Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Longitudinal fundus imaging and its genome-wide association analysis provides evidence for a human retinal aging clock
    Sara Ahadi
    Kenneth A Wilson Jr,
    Orion Pritchard
    Ajay Kumar
    Enrique M Carrera
    Ricardo Lamy
    Jay M Stewart
    Avinash Varadarajan
    Pankaj Kapahi
    Ali Bashir
    eLife (2023)
    Preview abstract Background Biological age, distinct from an individual’s chronological age, has been studied extensively through predictive aging clocks. However, these clocks have limited accuracy in short time-scales. Deep learning approaches on imaging datasets of the eye have proven powerful for a variety of quantitative phenotype inference and provide an opportunity to explore organismal aging and tissue health. Methods Here we trained deep learning models on fundus images from the EyePacs dataset to predict individuals’ chronological age. These predictions lead to the concept of a retinal aging clock which we then employed for a series of downstream longitudinal analyses. The retinal aging clock was used to assess the predictive power of aging inference, termed eyeAge, on short time-scales using longitudinal fundus imaging data from a subset of patients. Additionally, the model was applied to a separate cohort from the UK Biobank to validate the model and perform a GWAS. The top candidate gene was then tested in a fly model of eye aging. Findings EyeAge was able to predict the age with a mean absolute error of 3.26 years, which is much less than other aging clocks. Additionally, eyeAge was highly independent of blood marker-based measures of biological age (e.g. “phenotypic age”), maintaining a hazard ratio of 1.026 even in the presence of phenotypic age. Longitudinal studies showed that the resulting models were able to predict individuals’ aging, in time-scales less than a year with 71% accuracy. Notably, we observed a significant individual-specific component to the prediction. This observation was confirmed with the identification of multiple GWAS hits in the independent UK Biobank cohort. The knockdown of the top hit, ALKAL2, which was previously shown to extend lifespan in flies, also slowed age-related decline in vision in flies. Interpretation In conclusion, predicted age from retinal images can be used as a biomarker of biological aging in a given individual independently from phenotypic age. This study demonstrates the utility of retinal aging clock for studying aging and age-related diseases and quantitatively measuring aging on very short time-scales, potentially opening avenues for quick and actionable evaluation of gero-protective therapeutics. View details
    Deep diversification of an AAV capsid protein by machine learning
    Ali Bashir
    Sam Sinai
    Nina K. Jain
    Pierce J. Ogden
    Patrick F. Riley
    George M. Church
    Eric D. Kelsic
    Nature Biotechnology (2021)
    Preview abstract Modern experimental technologies can assay large numbers of biological sequences, but engineered protein libraries rarely exceed the sequence diversity of natural protein families. Machine learning (ML) models trained directly on experimental data without biophysical modeling provide one route to accessing the full potential diversity of engineered proteins. Here we apply deep learning to design highly diverse adeno-associated virus 2 (AAV2) capsid protein variants that remain viable for packaging of a DNA payload. Focusing on a 28-amino acid segment, we generated 201,426 variants of the AAV2 wild-type (WT) sequence yielding 110,689 viable engineered capsids, 57,348 of which surpass the average diversity of natural AAV serotype sequences, with 12–29 mutations across this region. Even when trained on limited data, deep neural network models accurately predict capsid viability across diverse variants. This approach unlocks vast areas of functional but previously unreachable sequence space, with many potential applications for the generation of improved viral vectors and protein therapeutics. View details
    Critiquing Protein Family Classification Models Using Sufficient Input Subsets
    Brandon Michael Carter
    Jamie Alexander Smith
    Theo Sanderson
    ACM SIGKDD Conference on Knowledge Discovery and Data Mining (2019) (to appear)
    Preview abstract In many application domains, neural networks are highly accurate and have been deployed at large scale. However, users often do not have good tools for understanding how these models arrive at their predictions. This has hindered adoption in fields such as the life and medical sciences, where researchers require that models base their decisions on underlying biological phenomena rather than peculiarities of the dataset introduced, e.g., as a function of when and how the data were collected. In response, we propose a set of methods for critiquing deep learning models, and demonstrate their application for protein family classification, a task for which high- accuracy models have considerable potential impact. Our methods extend the recently-introduced sufficient input subsets technique (SIS), which we use to identify the subset of locations (SIS) in each protein sequence that is sufficient for classification. Our suite of tools analyzes these SIS to shed light on the decision making criteria employed by models trained on this task. These tools expose that while these deep models may perform classification for biologically-relevant reasons, their behavior varies considerably across choice of network architecture and parameter initialization. While the techniques that we develop are specific to the protein sequence classification task, the approach taken generalizes to a broad set of scientific contexts in which model interpretability is essential. We encourage further application of our techniques for interrogating machine learning models trained on other scientifically relevant tasks. View details
    Preview abstract Understanding the relationship between amino acid sequence and protein function is a long-standing problem in molecular biology with far-reaching scientific implications. Despite six decades of progress, state-of-the-art techniques cannot annotate $\sim1/3$ of microbial protein sequences, hampering our ability to exploit sequences collected from diverse organisms. To address this, we report a deep learning model that learns the relationship between unaligned amino acid sequences and their functional classification across all 17929 families of the PFam database. Using the Pfam seed sequences we establish a rigorous benchmark assessment and find that a dilated convolutional model reduces the error of state of the art BLASTp and pHMM models by a factor of nine. With 80\% of the full Pfam database we train a protein family predictor that is more accurate and over 200 times faster than BLASTp, while learning sequence features such as structural disorder and transmembrane helices. Our model co-locates sequences from unseen families in embedding space far from existing families, allowing sequences from novel families to be classified. We anticipate that deep learning models will be a core component of future general-purpose protein function prediction tools. View details
    Preview abstract Low-carbon electricity technologies are often evaluated by their Levelized Cost of Energy (LCOE). However, LCOE cannot model the impact of one electricity source on the value of others. In previous work, System LCOE was proposed to estimate the costs of integrating an intermittent source into a grid consisting of multiple dispatchable electricity sources. Using a new DOSCOE (Dispatch-optimized system cost of electricity) model, we generalize System LCOE. DOSCOE can handle any mixture of dispatchable and non-dispatchable sources. It can analyze systems which contain storage, have legacy infrastructure, or have imposed policies. DOSCOE thus updates System LCOE to be applicable to more realistic electricity grid models. DOSCOE uses a linear program to find the capacity and generation mix which yields minimum LCOE. Running this linear program multiple times yields System LCOE curves. DOSCOE shows that to cost-effectively remove the last 10-20% of fossil fuels requires a moderate price on carbon and either low-cost nuclear power or carbon capture and sequestration. Alternatively, a hypothetical zero-carbon source needs to have a net present cost less than $2200/kW to displace existing fossil-fuel plants. View details
    No Results Found