Date of Award


Degree Name

Master of Science


Mathematical Sciences


Ming-Ying M. Leung


A single variation in the genetic sequence within the DNA of an organism could easily lead to beneficial, detrimental or neutral effects. Most often than not, these effects are detrimental than beneficial. While many biomedical and bioinformatics studies have been conducted to determine the genetic cause of prostate cancer (PrCa) which is still the second leading cause of cancer related death among men in the United States. An appreciable effort in statistical bioinformatics researches has been directed towards this aim. Through statistical analyses of a set of whole exome sequencing data from patients with PrCa obtained via The Cancer Genome Atlas (TCGA), this work seeks to augment current efforts by employing both partitional and hierarchical clustering methods to find groups of highly correlated genes associated with PrCa. The scan statistics were also used to identify possible mutational hotspots on those genes containing high numbers of genetic sequence variants. Our results indicated three pairs of variants that are constantly grouped together by multiple clustering methods. Furthermore, we found small regions on several genes containing unusually high concentration of sequence variants, which might suggest mutational hotspots that predispose individuals to PrCa. These results will be reported to biomedical scientists for further bioinformatics analyses and wet lab studies.

Key words: Statistical bioinformatics, Partitional clustering, Hierarchical clustering, Scan statistics, The Cancer Genome Atlas (TCGA), Prostate cancer, Whole exome sequencing (WES).




Received from ProQuest

File Size

104 p.

File Format


Rights Holder