Integrated Statistical and Machine Learning Algorithms for Predicting and Classifying G Protein-Coupled Receptors
G protein-coupled receptors (GPCRs) are transmembrane proteins with important functions in signal transduction and often serve as drug targets. With increasing availability of protein sequence information, there is much interest in computationally predicting GPCRs and classifying them according to their biological roles. Such predictions are cost-efficient and can be valuable guides for designing wet lab experiments to help elucidate signaling pathways and expedite drug discovery. There are existing computational tools of GPCR prediction that involve principal component analysis (PCA), intimate sorting (IS), support vector machine, and random forest (RF) techniques using various sequence derived features. While accuracies of over 90% were reported for their own test datasets, the capabilities in distinguishing GPCRs from transmembrane non-GPCRs had not been measured in any of these tools. Furthermore, no direct comparison of the different approaches has been conducted. In this project, we have established two new GPCR prediction algorithms that integrate combinations of PCA, IS, and RF with the univariate feature selection method that has not been used for GPCR predictions before. The same 1355 sequence features are used uniformly with a test dataset with 2179 positive examples of confirmed GPCRs, and 3781 negative examples including transmembrane non-GPCRs. Overall prediction accuracies are over 90%, and the false positive rates among the transmembrane non-GPCRs are substantially lower than those in existing tools. These results suggest that integrated algorithms perform well with GPCR prediction. We plan to further explore different integrated prediction approaches and apply them to the GPCR classification problem in the future.
Ayivor, Fredrick, "Integrated Statistical and Machine Learning Algorithms for Predicting and Classifying G Protein-Coupled Receptors" (2018). ETD Collection for University of Texas, El Paso. AAI13424677.