Merging Clinical and Genomic Data in Patients With Acute Leukemia for Downstream Analysis

Amanda M Bataycan, University of Texas at El Paso


The purpose of this study is to integrate multiple sources of information from patients with acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL) to construct organized datasets that would enable downstream bioinformatics and statistical analyses of the patients’ survival status and overall survival times in relation to their demographic, clinical, and genomic mutation profiles. With NIH Genomic Data Commons as the primary data resource and cBioPortal as the access portal, datasets on 149 and 603 unique patients with AML and ALL, respectively, were obtained. Python scripts were written to compile individual patients’ single nucleotide variant (SNV) data files into one dataset for each patient group. In both groups, over 95% of the SNVs occurred only in tumor samples while less than 0.02% only in normal samples. Compared to normal variants, tumor SNV change types favored mutations that reduced GC content of genes in both patient groups. Additional results showed shifts of variant densities on all chromosomes, most noticeably on chromosome 11 in patients with AML and chromosome 2 in patients with ALL. One important task accomplished in this work was merging the individual patients’ SNV data with their corresponding demographic and clinical information, which includes ethnicity and race, disease classification or staging, as well as survival outcomes among other variables. With the merged data, we propose several bioinformatics studies to investigate the functional effects of SNVs and to select likely leukemia-associated genes not reported to date in published literature. SNV occurrence frequencies in the selected genes will augment the patients’ demographic and clinical information to form the final set of variables to be analyzed. Our goal is to establish a predictive model for patients’ overall survival times to facilitate discoveries of potential gene therapy targets for acute leukemia.

Subject Area


Recommended Citation

Bataycan, Amanda M, "Merging Clinical and Genomic Data in Patients With Acute Leukemia for Downstream Analysis" (2023). ETD Collection for University of Texas, El Paso. AAI30634766.