Date of Award


Degree Name

Master of Science


Computer Science


Olac Fuentes

Second Advisor

Suman Sirimulla


In recent years, the cheminformatics community has seen an increased success with machine learning-based scoring functions for estimating binding affinities. The prediction of protein-ligand binding affinities is crucial for drug discovery research. Many physics-based scoring functions have been developed over the years. Lately, machine learning approaches are proven to boost the performance of traditional scoring functions. In this study, two scoring functions were developed; one is based on the Convolutional Neural Networks and the other one, called DLSCORE, is based on an ensemble of fully connected neural networks. Both the models were trained on the refined PDBbind (v.2016) dataset using different types of features. The results obtained from the CNN model was analyzed to show that nearest neighbor features are better than the distributed features. Moreover, canonically oriented molecular structures were proved to be better than the randomly oriented structures. The DLSCORE model which is an ensemble of 10 different networks, yielded a Pearson correlation coefficient of 0.82, a Spearman Rho coefficient of 0.90, Kendall Tau coefficient of 0.74, an RMSE of 1.15 kcal/mol, and an MAE of 0.86 kcal/mol for the test set, outperforming two very popular scoring functions.




Received from ProQuest

File Size

72 pages

File Format


Rights Holder

Md Mahmudulla Hassan