Hit Expansion Driven By Machine Learning
Abstract
Recent work \cite{McCloskey2020-es} utilized experimental data from DNA-encoded library (DEL) selections to train graph convolutional neural networks (GCNNs) \cite{Kearnes2016-sk} for identifying hit compounds for protein targets and their prospective test results demonstrated unprecedented hit rates for three diverse proteins. Building on this work, we proposed two novel approaches to leverage the DEL GCNN model's predictions and embeddings to automate hit expansion, a critical step in real-world drug discovery that guides the optimization of initial hit compounds toward clinical candidates. We prospectively tested the proposed approaches on a protein target (sEH) in the wet lab. The results showed that our methods identified more small molecules with higher potency compared to traditional molecular fingerprint similarity search based hit expansion. Specifically, we discovered $34$ analog compounds with higher potency than a sEH clinical trial candidate using our approaches. All sEH assay results are available at \url{https://www.tdcommons.org/dpubs_series/7414/}. Furthermore, applying the automated hit expansion approach to a novel protein target (WDR91) without prior known inhibitors, we discovered 2 active covalent analogs, representing the first reported small molecule ligands for this previously unexplored target.