Deep Learning Approach for the Discovery of Tumor-Targeting Small Organic Ligands from DNA-Encoded Chemical Libraries
DNA-Encoded Chemical Libraries (DELs) have emerged as efficient and cost-effective ligand discovery tools, which enable the generation of protein-ligand interaction data of unprecedented size. In this article, we present an approach that combines DEL screening and instance-level deep learning modeling to identify tumor-targeting ligands against Carbonic Anhydrase IX (CAIX), a clinically validated marker of hypoxia and clear cell Renal Cell Carcinoma. We present a new ligand identification and hit-to-lead strategy driven by machine learning models trained on DELs, which expand the scope of DEL-derived chemical motifs. CAIX-screening datasets obtained from three different DELs were used to train machine learning models for generating novel hits, dissimilar to elements present in the original DELs. Out of the 152 novel potential HITs that were identified with our approach and screened in an in vitro enzymatic inhibition assay, 70% displayed submicromolar activities (IC50 < 1 µM). To generate lead compounds that are functionalized with anticancer payloads, analogues of top hits were prioritized for synthesis based on the predicted CAIX affinity and synthetic feasibility. Three LEAD candidates showed accumulation on the surface of CAIX-expressing tumor cells in cellular binding assays. The best compound displayed an in vitro KD of 5.7 nM and selectively targeted tumors in mice bearing human Renal Cell Carcinoma lesions. Our results demonstrate the synergy between DEL and machine learning for the identification of novel HITs and for the successful translation of LEAD candidates for in vivo targeting applications.