Date of Award
2019-01-01
Degree Name
Master of Science
Department
Computational Science
Advisor(s)
Sangjin Kim
Second Advisor
Ming-Ying Leung
Abstract
In high-dimensional data, the performance of various classiers is largely dependent on the selection of important features. Most of the individual classiers using existing feature selection (FS) methods do not perform well for highly correlated data. Obtaining important
features using the FS method and selecting the best performing classier is a challenging task in high throughput data. In this research, we propose a combination of resampling based least absolute shrinkage and selection operator (LASSO) feature selection (RLFS)
and ensembles of regularized regression models (ERRM) capable of handling data with the high correlation structures. The ERRM boosts the prediction accuracy with the top-ranked features obtained from RLFS. The RLFS utilizes the LASSO penalty with sure
independence screening condition to select the top k ranked features. The ERRM includes ve individual penalty-based methods: LASSO, adaptive LASSO (ALASSO), elastic net (ENET), smoothly clipped absolute deviations (SCAD), and minimax concave penalty
(MCP). It is built on the idea of bagging and rank aggregation. Upon performing simulation studies and applying to smokers cancer gene expression data, we demonstrated that the proposed combination of ERRM with RLFS achieved superior performance in accuracy and geometric mean.
Language
en
Provenance
Received from ProQuest
Copyright Date
2019-12
File Size
95 pages
File Format
application/pdf
Rights Holder
Abhijeet R. Patil
Recommended Citation
Patil, Abhijeet R., "Combination Of Resampling Based Lasso Feature Selection And Ensembles Of Regularized Regression Models" (2019). Open Access Theses & Dissertations. 2886.
https://scholarworks.utep.edu/open_etd/2886