Date of Award
2024-08-01
Degree Name
Doctor of Philosophy
Department
Mathematical Sciences
Advisor(s)
Xiaogang Su
Abstract
The exponential growth of data has led to a rapid increase in high-dimensional datasets across various domains, presenting significant challenges in data analysis, particularly in predictive modeling tasks. Traditional Random Forest (RF), while robust, often struggles with datasets filled with numerous noisy or non-informative features, compromising both performance and accuracy. This study introduces an advanced algorithm, High-Dimensional Random Forests (HDRF), designed to address these challenges by integrating robust multivariate feature selection techniques directly into the decision tree construction process. Unlike standard RF, HDRF incorporates ridge regression-based variable screening at each decision split, enhancing its ability to identify and utilize the most informative features effectively. We conducted extensive simulation studies to demonstrate HDRF's superior performance in managing high-dimensional noise and improving predictive accuracy. Furthermore, HDRF's efficacy is validated through real-world applications in a residential housing dataset for regression tasks and a prostate cancer dataset for classification, showcasing its potential in practical, high-stakes environments. This work not only extends the capabilities of ensemble learning models in handling complex datasets but also sets a precedent for future research in algorithmic enhancements for high-dimensional data analysis.
Language
en
Provenance
Received from ProQuest
Copyright Date
2024-08-01
File Size
116 p.
File Format
application/pdf
Rights Holder
George Ekow Quaye
Recommended Citation
Quaye, George Ekow, "Random Forest For High-Dimensional Data" (2024). Open Access Theses & Dissertations. 4201.
https://scholarworks.utep.edu/open_etd/4201