TY - JOUR
T1 - Assessing Decision Tree Stability
T2 - A Comprehensive Method for Generating a Stable Decision Tree
AU - Lee, Jeongeon
AU - Sim, Min Kyu
AU - Hong, Jung Sik
N1 - Publisher Copyright:
©2024 The Authors.
PY - 2024
Y1 - 2024
N2 - Objectives: This paper proposes a novel stability metric for decision trees that does not rely on the elusive notion of tree similarity. Existing stability metrics have been constructed in a pairwise fashion to assess the tree similarity between two decision trees. However, quantifying the structural similarities between decision trees is inherently elusive. Conventional stability metrics are simply relying on partial information such as the number of nodes and the depth of the tree, which do not adequately capture structural similarities. Methods: We evaluate the stability based on the computational burden required to generate a stable tree. First, we generate a stable tree using the novel adaptive node-level stabilization method, which determines the most frequently selected predictor during the bootstrapping iterations of a decision tree branching process at each node. Second, the stability is measured based on the number of bootstraps required to achieve the stable tree. Findings: Using the proposed stability metric, we compare the stability of four popular decision tree splitting criteria: Gini index, entropy, gain ratio, and chi-square. In an empirical study across ten datasets, the gain ratio is the most stable splitting criterion among the four popular criteria. Additionally, a case study demonstrates that applying the proposed method to the classification and regression tree (CART) algorithm generates a more stable tree compared to the one produced by the original CART algorithm. Novelty: We propose a stability metric for decision trees without relying on measuring pairwise tree similarity. This paper provides a stability comparison of four popular decision tree splitting criteria, delivering practical insights into their reliability. The adaptive node-level stabilization method can be applied across various decision tree algorithms, enhancing tree stability and reliability in scenarios with updating data.
AB - Objectives: This paper proposes a novel stability metric for decision trees that does not rely on the elusive notion of tree similarity. Existing stability metrics have been constructed in a pairwise fashion to assess the tree similarity between two decision trees. However, quantifying the structural similarities between decision trees is inherently elusive. Conventional stability metrics are simply relying on partial information such as the number of nodes and the depth of the tree, which do not adequately capture structural similarities. Methods: We evaluate the stability based on the computational burden required to generate a stable tree. First, we generate a stable tree using the novel adaptive node-level stabilization method, which determines the most frequently selected predictor during the bootstrapping iterations of a decision tree branching process at each node. Second, the stability is measured based on the number of bootstraps required to achieve the stable tree. Findings: Using the proposed stability metric, we compare the stability of four popular decision tree splitting criteria: Gini index, entropy, gain ratio, and chi-square. In an empirical study across ten datasets, the gain ratio is the most stable splitting criterion among the four popular criteria. Additionally, a case study demonstrates that applying the proposed method to the classification and regression tree (CART) algorithm generates a more stable tree compared to the one produced by the original CART algorithm. Novelty: We propose a stability metric for decision trees without relying on measuring pairwise tree similarity. This paper provides a stability comparison of four popular decision tree splitting criteria, delivering practical insights into their reliability. The adaptive node-level stabilization method can be applied across various decision tree algorithms, enhancing tree stability and reliability in scenarios with updating data.
KW - Decision trees
KW - splitting criteria
KW - stability
UR - https://www.scopus.com/pages/publications/85197082215
U2 - 10.1109/ACCESS.2024.3419228
DO - 10.1109/ACCESS.2024.3419228
M3 - Article
AN - SCOPUS:85197082215
SN - 2169-3536
VL - 12
SP - 90061
EP - 90072
JO - IEEE Access
JF - IEEE Access
ER -