Benchmarking Machine Learning Methods for Molecular Property Prediction

Govinda Bahadur K. C., University of Texas at El Paso


Machine learning (ML) techniques have been widely applied in a variety of areas ranging from pattern recognition, natural language processing, and computer games to self-driving cars, clinical diagnostics, and molecular structure prediction easing day to day life of human beings. Drug discovery is an expensive, complex, and time taking process. Currently, the pharma industry is hoping to leverage machine learning methods in expediting the drug discovery process. Molecular property prediction is one of the most important tasks in drug discovery. While developing a new drug relies on a proper understanding of molecular properties, there has been great interest in the potential of machine learning models to predict molecular properties. In this dissertation, I have benchmarked several ML algorithms against a variety of drug discovery related property predictions. More specifically, a comparison of several widely used ML algorithms with advanced ML algorithms such as Direct Message Passing Neural Network (D-MPNN) is discussed in various molecular property prediction models. The traditional ML models are trained on computed molecular fingerprints whereas D-MPNN is a graph-based neural network that learns by operating on the graph structure of the molecule. The work presented in this dissertation is available as free and user-friendly computational tools that cover a wide range of biochemical tasks such as binding affinity calculation, drug-target activity prediction and absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties prediction. They can be accessed online via webservers ( and ( The source code and datasets are available in GitHub ( for interested computational scientists to further validate and benchmark new algorithms.

Subject Area

Pharmaceutical sciences|Computer science|Artificial intelligence|Bioinformatics

Recommended Citation

K. C., Govinda Bahadur, "Benchmarking Machine Learning Methods for Molecular Property Prediction" (2020). ETD Collection for University of Texas, El Paso. AAI28262535.