Making Valid Inferences with Decision Tree

George Ekow Quaye, University of Texas at El Paso


Hypothesis testing and Confidence Interval (CI) estimates are key statistics in predicting future values in data analysis. Most often, CI estimates are directly obtained from the summary statistics of a particular statistical methodology output. However, when it comes to the summary of decision tree outputs, these CI estimates are not directly obtained. So a na"{i}ve way of making node-level inference is to construct a (1-alpha) times 100% confidence interval for a node mean bar{y}t using the relation: bar{y}t z1-α/2 (st)/sqrt{nt, where bar{y}t is the node mean and st is the standard deviation estimates from the decision tree summary. Nevertheless, these sets of intervals tend to be over-optimistic owing to the very adaptive nature of tree modeling, in other words, they are too narrow to have the desired coverage. This challenge with CI in tree summary stands as one of the most common requests from the users of decision trees that are however rarely fulfilled in practice. In this research, we make a strong effort to nail out the source of over-optimistic and correct it accordingly. We began by treating this issue with an existing method known as the Bootstrap Calibration (BC) on the α. Statistically, this BC method is also plagued with overfitted estimates. We then resorted to our approach (Bootstrap Bias Correction), an approach that seeks to correct a downwards biasedness in the st estimates to obtained bias-corrected SD estimates (st"). Now ,the node mean bar{y}t, the node sample size nt, a fixed α value together with the BBC estimate (st") was then used to obtain a more accurate CI intervals for bar{y}t through the relation: bar{y}t + - z1-α/2 st" / sqrt<(nt). The CI estimates from the proposed method (BBC) were empirically assessed and illustrated through simulation studies and validated via real data exploration.

Subject Area

Statistics|Artificial intelligence

Recommended Citation

Quaye, George Ekow, "Making Valid Inferences with Decision Tree" (2021). ETD Collection for University of Texas, El Paso. AAI28540975.