Date of Award

2025-05-01

Degree Name

Master of Science

Department

Mathematical Sciences

Advisor(s)

Xiaogang Su

Abstract

Decision trees, particularly those built using the Classification and Regression Trees (CART) algorithm, are widely used for their interpretability and flexibility. However, the greedy nature of the CART splitting procedure gives rise to the end-cut preference (ECP) phenomenon, wherein split points near the extremes of predictor ranges are favored. This study offers a comprehensive investigation of ECP, exploring its theoretical underpinnings, practical manifestations, and implications for both single decision trees and ensemble methods such as Random Forests. Through theoretical analysis and simulation studies, we examine how ECP affects tree structure, variable selection, and predictive accuracy across tree-structured, linear, and nonlinear settings. Our findings reveal that while ECP may have negligible impact on individual tree accuracy, it can negatively influence Random Forests, possibly due to reduced model diversity. To address this, we evaluate the Smooth Sigmoid Surrogate (SSS) method as a regularized alternative to the traditional greedy search, demonstrating its potential to mitigate ECP and enhance model robustness. These insights contribute to a deeper understanding of recursive partitioning methods and inform the design of more reliable tree-based learning algorithms.

Language

en

Provenance

Received from ProQuest

File Size

58 p.

File Format

application/pdf

Rights Holder

Xiangya Wang

Share

COinS