Open Access Theses & Dissertations

Gene Selection And Classification In High-Throughput Biological Data With Integrated Machine Learning Algorithms And Bioinformatics Approaches

Abhijeet R Patil, University of Texas at El Paso

Date of Award

2021-05-01

Degree Name

Doctor of Philosophy

Department

Computational Science

Advisor(s)

Ming-Ying Leung

Second Advisor

Sourav Roy

Abstract

With the rise of high throughput technologies in biomedical research, large volumes of expression profiling, methylation profiling, and RNA-sequencing data are being generated. These high-dimensional data have large number of features with small number of samples, a characteristic called the "curse of dimensionality." The selection of optimal features, which largely affects the performance of classification algorithms in machine learning models, has led to challenging problems in bioinformatics analyses of such high-dimensional datasets. In this work, I focus on the design of two-stage frameworks of feature selection and classification and their applications in multiple sets of colorectal cancer data. The first algorithm developed was a combination of resampling based least absolute shrinkage and selection operator (lasso) feature selection (RLFS) and ensembles of regularized regression models (ERRM) capable of handling data with high correlation structures. The ERRM boosted the prediction accuracy with the top-ranked features obtained from RLFS. The second algorithm was a modified adaptive lasso method with normalized weights from various feature selection methods. Here, the genes were ranked based on their levels of statistical significance. The scores of the ranked genes were normalized and assigned as proposed weights to the adaptive lasso method to obtain the most significant genes known to be biologically related to the cancer type and helped attain higher classification performance. Lastly, we introduced a resampling of group lasso (glasso) feature selection method capable of ignoring the features unrelated to the response variable considering the group correlation among the features. These features, when applied on various classifiers, showed an increase in the classification accuracy. We applied the above algorithms on both simulated and real data to show that our methods have better performance compared to existing ones. In the real data application, we combined machine learning with various bioinformatics tools, such as STRINGdb and Cytoscape, to explore 13 sets of microarray and RNA-seq data to identify hub genes in colorectal cancer. The results could be useful for suggesting further studies to reveal potential biomarkers that might lead to better cancer diagnoses and treatments.

Language

Provenance

Received from ProQuest

Copyright Date

2021-05

File Size

278 p.

File Format

application/pdf

Rights Holder

Abhijeet R Patil

Recommended Citation

Patil, Abhijeet R, "Gene Selection And Classification In High-Throughput Biological Data With Integrated Machine Learning Algorithms And Bioinformatics Approaches" (2021). Open Access Theses & Dissertations. 3316.
https://scholarworks.utep.edu/open_etd/3316

Download

Included in

Applied Mathematics Commons, Bioinformatics Commons, Biostatistics Commons

COinS

Open Access Theses & Dissertations

Gene Selection And Classification In High-Throughput Biological Data With Integrated Machine Learning Algorithms And Bioinformatics Approaches

Date of Award

Degree Name

Department

Advisor(s)

Second Advisor

Abstract

Language

Provenance

Copyright Date

File Size

File Format

Rights Holder

Recommended Citation

Included in

Search

Links

Browse

Author Corner

Open Access Theses & Dissertations

Gene Selection And Classification In High-Throughput Biological Data With Integrated Machine Learning Algorithms And Bioinformatics Approaches

Author

Date of Award

Degree Name

Department

Advisor(s)

Second Advisor

Abstract

Language

Provenance

Copyright Date

File Size

File Format

Rights Holder

Recommended Citation

Included in

Share

Search

Links

Browse

Author Corner