Computational Methods for Prediction and Classification of G Protein-Coupled Receptors
G protein-coupled receptors (GPCRs) constitute the largest group of membrane receptor proteins in eukaryotes. Due to their significant roles in many physiological processes such as vision, smell, and inflammation, GPCRs are the targets of many prescribed drugs. However, the functional and structural diversity of GPCRs has kept their prediction and classification based on amino acid sequence data as a challenging bioinformatics problem. As existing computational methods to predict and classify GPCRs are focused on mammalian (mostly human) data, the ultimate goal of our project is to establish an ensemble approach and implement a web-based software that can be used reliably on a wider range of organisms. As a first step, we have constructed a searchable MySQL database with experimentally confirmed GPCRs and non-GPCRs along with protein features for distinguishing them. This database currently contains 2887 GPCR and 1614 non-GPCR sequences collected from the UniProtKB/Swiss-Prot protein database, covering over 300 species including arthropods, fungi, nematode, etc. Each protein in the database is assigned a unique identification number and linked to information about its source organism, sequence lengths, and other features including amino acid and dipeptide composition. For the GPCRs, family classifications according to the popular GRAFS and IUPHAR systems are also included. This database will provide the training and testing data for subsequent steps in our ongoing work to evaluate existing computational tools, incorporate them into our ensemble, and apply them to identify potential GPCRs in several fly, mosquito, and tick species that are of biomedical or agricultural importance.
Begum, Khodeza, "Computational Methods for Prediction and Classification of G Protein-Coupled Receptors" (2017). ETD Collection for University of Texas, El Paso. AAI10690089.