RuleCOSI: Combination and simplification of production rules from boosted decision trees for imbalanced classification

Josue Obregon, Aekyung Kim, Jae Yoon Jung

Research output: Contribution to journalArticlepeer-review

36 Scopus citations

Abstract

In the field of machine learning, the problem of imbalanced classification arises when the class percentage on the data is unevenly distributed. Different strategies using boosting ensemble algorithms have shown improved results over the imbalanced classification problem by combining weak learners to produce a single strong learner. In particular, decision trees are often used as base learners in ensemble learning for classification or regression. However, boosting ensemble algorithms sometimes generate a large number of decision trees that could grow too large to be understandable and interpretable. Additionally, the use of weights adds more complexity to the final result. For this reason, in this paper, we present RuleCOSI, a novel method for combining and simplifying the output of an ensemble of binary decision trees into a single set of production rules. The proposed method takes into account the weight of each decision tree and using a combination matrix generates a single set of simplified production rules with performance comparable to that of the original boosting ensemble. In order to measure the performance and prove the applicability of the proposed method, we carried out an empirical validation using three different boosting algorithms over several well-known machine learning datasets as well as real-life data collected from a manufacturing company. The results of the algorithm are acceptable in most of the experiments reducing the complexity of the boosting ensemble output while maintaining a similar performance.

Original languageEnglish
Pages (from-to)64-82
Number of pages19
JournalExpert Systems with Applications
Volume126
DOIs
StatePublished - 15 Jul 2019

Keywords

  • Boosting
  • Decision trees
  • Ensemble learning
  • Imbalanced classification
  • Rule extraction

Fingerprint

Dive into the research topics of 'RuleCOSI: Combination and simplification of production rules from boosted decision trees for imbalanced classification'. Together they form a unique fingerprint.

Cite this