Date of Award

2024-08-01

Degree Name

Doctor of Philosophy

Department

Mathematical Sciences

Advisor(s)

Xiaogang Su

Abstract

The exponential growth of data has led to a rapid increase in high-dimensional datasets across various domains, presenting significant challenges in data analysis, particularly in predictive modeling tasks. Traditional Random Forest (RF), while robust, often struggles with datasets filled with numerous noisy or non-informative features, compromising both performance and accuracy. This study introduces an advanced algorithm, High-Dimensional Random Forests (HDRF), designed to address these challenges by integrating robust multivariate feature selection techniques directly into the decision tree construction process. Unlike standard RF, HDRF incorporates ridge regression-based variable screening at each decision split, enhancing its ability to identify and utilize the most informative features effectively. We conducted extensive simulation studies to demonstrate HDRF's superior performance in managing high-dimensional noise and improving predictive accuracy. Furthermore, HDRF's efficacy is validated through real-world applications in a residential housing dataset for regression tasks and a prostate cancer dataset for classification, showcasing its potential in practical, high-stakes environments. This work not only extends the capabilities of ensemble learning models in handling complex datasets but also sets a precedent for future research in algorithmic enhancements for high-dimensional data analysis.

Language

en

Provenance

Received from ProQuest

File Size

116 p.

File Format

application/pdf

Rights Holder

George Ekow Quaye

Share

COinS