Date of Award


Degree Name

Doctor of Philosophy


Computational Science


Ming-Ying Leung


With the ever-increasing varieties of sequencing techniques, the volume and scope of genomic data are explosively expanded, offering unparalleled opportunities for researchers to study gene-disease associations, identify biomarkers, and thus develop more effective diagnostic and therapeutic strategies. In this project, I have developed a computational workflow and a new scoring scheme, which combine statistical frequency-based analyses with two well-established functional effect prediction tools FATHMM and PROVEAN, to evaluate nonsynonymous GSVs and identify potential cancer-related protein-coding genes for downstream enrichment and protein-protein interaction (PPI) studies.

This method has been applied to process a collection of 503 whole exome sequencing datasets for patients with prostate cancer (PrCa). The datasets were downloaded from The Cancer Genome Atlas as variant call format (VCF) files containing GSV information for paired tumor and normal samples. Exploratory statistics revealed unusually high level of transitions G→A and C→T among cancer samples. Furthermore, 5 GSVs were found significantly associated with the disease. Among 61 high-scoring genes identified by our scoring scheme, 27 were found by PPI analysis to have degrees of connection ≥ 4 with well-known PrCa-related genes. While 18 of them are reportedly associated with PrCa, 9 genes (TRRAP, EPHB1, HERC2, MCM3, SPTA1, SALL1, HERC1, TTN, and MYH6) have not been previously documented in relation to PrCa. Their potential roles in PrCa could be investigated by further bioinformatics and wet-lab studies.




Received from ProQuest

File Size

122 p.

File Format


Rights Holder

Bofei Wang

Included in

Biology Commons