Date of Award
2025-05-01
Degree Name
Doctor of Philosophy
Department
Computational Science
Advisor(s)
Ming-Ying Leung
Abstract
A quantitative integrated scoring function, iQ(G) was developed to assess the cumulative effects of nonsynonymous single nucleotide variants (SNVs) on the protein-coding genes with the goal to find novel candidate cancer-related genes from patients with acute myeloid leukemia (AML), acute lymphoblastic leukemia (ALL), and ovarian cancer (OC). With Genomic Data Commons as primary data resource for this project, whole-exome SNV data were extracted on patients with one of these three cancers. For each specific cancer, the iQ(G) function sums up the deleterious effects of individual SNVs with respect to the transcripts of the gene G in which they occur, weighted by the occurrence frequency difference between tumor and normal samples among patients and accounting for transcript lengths, to provide an overall cumulative pathogenic score for the gene. After obtaining the iQ(G) scores, the genes can be ranked accordingly, and the top-ranking genes are considered likely to be associated with the cancer. In this study, we applied iQ(G) scoring using four established SNV effect analyzers, namely FATHMM-XF, SIFT, PolyPhen, and CADD, as well as their averages. With a compiled list of known genes for each cancer type, we assessed the performance of iQ(G) when used with the individual analyzers, and with two integrative approaches that averaged the variant effects. The assessment results suggested that the integrated average approach had an overall advantage over using individual analyzers. Downstream bioinformatics analysis, including protein-protein interaction, gene ontology, and pathway analysis, performed on the top-scoring genes revealed similar carcinogenic pathways between the three cancers. This computational framework can be easily adapted to analyze SNV datasets for other cancers and to accommodate new SNV effect analyzers as they are developed in the future.
Language
en
Provenance
Received from ProQuest
Copyright Date
2025-05
File Size
109 p.
File Format
application/pdf
Rights Holder
Amanda Maria Bataycan
Recommended Citation
Bataycan, Amanda Maria, "Computational Framework for Integrating Single Nucleotide Variant Scores to Identify Novel Genes in Cancers" (2025). Open Access Theses & Dissertations. 4334.
https://scholarworks.utep.edu/open_etd/4334