Date of Award
2025-05-01
Degree Name
Master of Science
Department
Mathematical Sciences
Advisor(s)
Xiaogang Su
Abstract
Decision trees, particularly those built using the Classification and Regression Trees (CART) algorithm, are widely used for their interpretability and flexibility. However, the greedy nature of the CART splitting procedure gives rise to the end-cut preference (ECP) phenomenon, wherein split points near the extremes of predictor ranges are favored. This study offers a comprehensive investigation of ECP, exploring its theoretical underpinnings, practical manifestations, and implications for both single decision trees and ensemble methods such as Random Forests. Through theoretical analysis and simulation studies, we examine how ECP affects tree structure, variable selection, and predictive accuracy across tree-structured, linear, and nonlinear settings. Our findings reveal that while ECP may have negligible impact on individual tree accuracy, it can negatively influence Random Forests, possibly due to reduced model diversity. To address this, we evaluate the Smooth Sigmoid Surrogate (SSS) method as a regularized alternative to the traditional greedy search, demonstrating its potential to mitigate ECP and enhance model robustness. These insights contribute to a deeper understanding of recursive partitioning methods and inform the design of more reliable tree-based learning algorithms.
Language
en
Provenance
Received from ProQuest
Copyright Date
2025-05
File Size
58 p.
File Format
application/pdf
Rights Holder
Xiangya Wang
Recommended Citation
Wang, Xiangya, "A Study Of End-Cut Preference In Tree-Based Modeling" (2025). Open Access Theses & Dissertations. 4500.
https://scholarworks.utep.edu/open_etd/4500