Deep Learning Models for Scoring Protein-ligand Interaction Energies

Mahmudulla Hassan, University of Texas at El Paso


In recent years, the cheminformatics community has seen an increased success with machine learning-based scoring functions for estimating binding affinities. The prediction of protein-ligand binding affinities is crucial for drug discovery research. Many physics-based scoring functions have been developed over the years. Lately, machine learning approaches are proven to boost the performance of traditional scoring functions. In this study, two scoring functions were developed; one is based on the Convolutional Neural Networks and the other one, called DLSCORE, is based on an ensemble of fully connected neural networks. Both the models were trained on the refined PDBbind (v.2016) dataset using different types of features. The results obtained from the CNN model was analyzed to show that nearest neighbor features are better than the distributed features. Moreover, canonically oriented molecular structures were proved to be better than the randomly oriented structures. The DLSCORE model which is an ensemble of 10 different networks, yielded a Pearson correlation coefficient of 0.82, a Spearman Rho coefficient of 0.90, Kendall Tau coefficient of 0.74, an RMSE of 1.15 kcal/mol, and an MAE of 0.86 kcal/mol for the test set, outperforming two very popular scoring functions.

Subject Area

Computer science

Recommended Citation

Hassan, Mahmudulla, "Deep Learning Models for Scoring Protein-ligand Interaction Energies" (2018). ETD Collection for University of Texas, El Paso. AAI10931107.