Evaluating Binary Splits on Nominal Inputs
The maximally selected statistic approach in building tree models is shown to be a cause of variable selection bias. In this study we propose three methods to solve this problem in building regression trees with nominal predictor variables. Out of the three methods proposed we explored only two in detail and defer one for further research. We developed an exact method to compute the p-value corresponding to the maximized splitting statistic in regression trees for nominal predictor variables with at most 10 distinct levels and a method to estimate the best cutoff point as a parameter in a parametric nonlinear mixed-effect model in regression trees for nominal predictor variables with any number of distinct levels. The methods are shown to overcome the variable selection bias in an extensive simulation study and in a real data example.
Ocloo, Isaac Xoese, "Evaluating Binary Splits on Nominal Inputs" (2017). ETD Collection for University of Texas, El Paso. AAI10607705.