Date of Award
Master of Science
High-dimensional data has become a major research area in the field of genetics, bioinformatics and bio-statistics due to advancement of technologies. Some common issues of modeling high-dimensional gene expression data are that many of the genes may not be relevant. Also, reducing the dimensions of the data using penalized logistic regression is one of the major challenges when there exists a high correlation among genes. High-dimension data correspond to the situation where the number of variables is greater or larger than the number of observations. Gene selection proved to be an effective way to improve the results of many classification methods. Many different methods have been proposed, however, these methods face a critical challenge in practical applications when there are high correlations among genes. Penalized logistic regression using the Least Absolute Shrinkage Selection Operator (Lasso) has been criticized for being biased in gene selection. Adaptive Lasso (Alasso) was proposed to overcome the selection bias by assigning a consistent weight to each gene yet faces practical problems when choosing the type of initial weight. To address this problem, penalized logistic regression is proposed with the aim of obtaining an efficient subset of genes with high classification capabilities by combining the screening approach as a filter method and Adaptive Lasso with a new weight. An alternative weight in adaptive penalized logistic regression is proposed to solve this problem. We worked on existing data set and we empirically verified the proposed method performed better than other existing methods. We then used Leukemia Cancer and Colon Cancer data set to test our proposed method. The experimental results reveal the proposed method is quite efficient and feasible and hence exhibits competitive performance in both classification accuracy and gene selection.
Received from ProQuest
Derrick Kwesi Bonney
Bonney, Derrick Kwesi, "General Penalized Logistic Regression For Gene Selection In High-Dimensional Microarray Data Classification" (2020). Open Access Theses & Dissertations. 3084.