GPCR-PEnDB: A Database of Protein Sequences and Derived Features to Facilitate Prediction and Classification of G Protein-coupled Receptors
G protein-coupled receptors (GPCRs) constitute the largest group of membrane receptor proteins in eukaryotes. Due to their significant roles in various physiological processes such as vision, smell, and inflammation, GPCRs are the targets of many prescription drugs. However, the functional and sequence diversity of GPCRs has kept their prediction and classification based on amino acid sequence data as a challenging bioinformatics problem. There are existing computational approaches, mainly using machine learning and statistical methods, to predict and classify GPCRs based on amino acid sequence and sequence derived features. In this project, we have constructed a searchable MySQL database, named GPCR-PEnDB (GPCR Prediction Ensemble Database), of confirmed GPCRs and non-GPCRs with the goal of allowing users to conveniently access useful information of GPCRs in a wide range of organisms and to compile reliable training and testing datasets for different combinations of computational tools. GPCR-PEnDB currently contains 3129 confirmed GPCR and 3575 non-GPCR sequences collected from the UniProtKB/Swiss-Prot protein database, encompassing over 1200 species. The non-GPCR entries include transmembrane proteins for evaluating various prediction programs’ abilities to distinguish GPCRs from other transmembrane proteins. Each protein is linked to information about its source organism, classification, sequence lengths and composition, and other derived sequence features. Compared to GPCRdb, which is considered the most comprehensive GPCR resource available, our database contains much fewer GPCR sequences because of our requirement for every GPCR to be confirmed. Nevertheless, our database contains 1094 GPCRs not found in GPCRdb. In particular, all of the class D and E GPCRs and many of Class A sensory receptors are missing from GPCRdb. I will present several examples of using this GPCR-PEnDB along with its graphical user interface to query for GPCRs with specific sequence properties and to compare the prediction accuracies of GPCR prediction tools. This initial version of GPCR-PEnDB will provide a framework for future extensions to include additional sequence features, three-dimensional structural data, and ligand binding information to facilitate the design and assessment of GPCR prediction and classification tools as well as experimental studies to help understand the functional roles of various types of GPCRs.
Begum, Khodeza, "GPCR-PEnDB: A Database of Protein Sequences and Derived Features to Facilitate Prediction and Classification of G Protein-coupled Receptors" (2019). ETD Collection for University of Texas, El Paso. AAI27667717.