Date of Award

2021-05-01

Degree Name

Master of Science

Department

Mathematical Sciences

Advisor(s)

Xiaogang X. Su

Abstract

The significant advances in technology have enabled easy collection and management of high-dimensional data in many fields, however, the process of modeling these data imposes a huge problem in the field of data science. Dealing with high-dimensional data is one of the significant challenges that degenerate the performance and precision of most classification and regression algorithms, e.g., random forests. Random Forest (RF) is among the few methods that can be extended to model high-dimensional data; nevertheless, its performance and precision, like others, are highly affected by high dimensions, especially when the dataset contains a huge number of noise or noninformative features. It is known in literature that data dominated with a high number of uninformative features have a small number of expected informative variables that could lead to the challenge of obtaining an accurate or robust random forest model.

In this study, we present a new algorithm that incorporates ridge regression as a variable screening tool to discern informative features in the setting of high dimensions and apply the classical random forest to a top portion of selected important features. Simulation studies on high dimensions are carried out to test how our proposed method addressesthe above problem and improves the performance of random forest models. To illustrate our method, we applied it to a real-life dataset (Communities and Crime Dataset), which was sourced from the UCI database. The results show how variable screening using ridge regression could be a very useful tool for building high-dimensional random forests.

Language

Provenance

Received from ProQuest

Copyright Date

2021-05

File Size

77 p.

File Format

application/pdf

Rights Holder

Roland Fiagbe

Recommended Citation

Fiagbe, Roland, "High-Dimensional Random Forests" (2021). Open Access Theses & Dissertations. 3252.
https://scholarworks.utep.edu/open_etd/3252

Download

Included in

Statistics and Probability Commons

COinS

Open Access Theses & Dissertations

High-Dimensional Random Forests

Date of Award

Degree Name

Department

Advisor(s)

Abstract

Language

Provenance

Copyright Date

File Size

File Format

Rights Holder

Recommended Citation

Included in

Search

Links

Browse

Author Corner

Open Access Theses & Dissertations

High-Dimensional Random Forests

Author

Date of Award

Degree Name

Department

Advisor(s)

Abstract

Language

Provenance

Copyright Date

File Size

File Format

Rights Holder

Recommended Citation

Included in

Share

Search

Links

Browse

Author Corner