RuleCOSI+: Rule extraction for interpreting classification tree ensembles

Josue Obregon, Jae Yoon Jung

Research output: Contribution to journalArticlepeer-review

26 Scopus citations

Abstract

Despite the advent of novel neural network architectures, tree-based ensemble algorithms such as random forests and gradient boosting machines still prevail in many practical machine learning problems in manufacturing, financial, and medical domains. However, tree ensembles have the limitation that the internal decision mechanisms of complex models are difficult to understand. Therefore, we present a post-hoc interpretation approach for classification tree ensembles. The proposed method, RuleCOSI+, extracts simple rules from tree ensembles by greedily combining and simplifying their base trees. Compared with its previous version, RuleCOSI, this new version can be applied to both bagging (e.g., random forest, RF) and boosting ensembles (e.g., gradient boosting machines, GBM) and run much faster for ensembles with hundreds of trees. To assess the performance and applicability of the method, empirical experiments were conducted using two bagging algorithms and four gradient boosting algorithms over 33 datasets. RuleCOSI+ could generate the best classification rulesets in terms of F-measure together with RuleFit for RF and GBM models of the datasets among five ensemble simplification algorithms, but the rulesets of RuleCOSI+ had, on average, less than half the size of those of RuleFit. Moreover, RuleCOSI+ had the best antecedent uniqueness rate (“UNIQ”) among the five algorithms, and had also ranked high in the number of rules (“NRULES”) and the rule reduction rate (“REDU”). In addition, the proposed method could reduce generalization errors in the simplified rulesets to obtain, on average, slightly better classification errors than original models of two bagging and three gradient boosting algorithms except CATBoost.

Original languageEnglish
Pages (from-to)355-381
Number of pages27
JournalInformation Fusion
Volume89
DOIs
StatePublished - Jan 2023

Keywords

  • Ensemble learning
  • Ensemble simplification
  • Explainable artificial intelligence
  • Interpretable machine learning
  • Rule extraction
  • tree ensembles

Fingerprint

Dive into the research topics of 'RuleCOSI+: Rule extraction for interpreting classification tree ensembles'. Together they form a unique fingerprint.

Cite this