- Vikram Sundar
- Lucy Colwell
Machine learning (ML) models trained to predict ligand binding to single proteins have achieved remarkable success, but cannot make predictions about protein targets other than the one they are trained on. Models that make predictions for multiple proteins and multiple ligands, known as drug-target interaction (DTI) models, aim to solve this problem but generally have lower performance. In this work, we improve the performance of DTI models by taking advantage of the accuracy of single protein/ligand binding models. Specifically, we first construct individual protein/ligand binding models for all train proteins with some experimental data, then use each individual model to make predictions for all remaining ligands, against the corresponding protein target. Finally, we use the known and predicted ligand binding data for all targets in a DTI model to make predictions for the unseen test proteins. This approach significantly improves performance; most importantly, some of our models are able to achieve Areas Under the Receiver Operator Characteristic curve (AUCs) exceeding $0.9$ on test datasets that contain only unseen proteins and unseen ligands.