TY - JOUR
T1 - RuleCOSI
T2 - Combination and simplification of production rules from boosted decision trees for imbalanced classification
AU - Obregon, Josue
AU - Kim, Aekyung
AU - Jung, Jae Yoon
N1 - Publisher Copyright:
© 2019 Elsevier Ltd
PY - 2019/7/15
Y1 - 2019/7/15
N2 - In the field of machine learning, the problem of imbalanced classification arises when the class percentage on the data is unevenly distributed. Different strategies using boosting ensemble algorithms have shown improved results over the imbalanced classification problem by combining weak learners to produce a single strong learner. In particular, decision trees are often used as base learners in ensemble learning for classification or regression. However, boosting ensemble algorithms sometimes generate a large number of decision trees that could grow too large to be understandable and interpretable. Additionally, the use of weights adds more complexity to the final result. For this reason, in this paper, we present RuleCOSI, a novel method for combining and simplifying the output of an ensemble of binary decision trees into a single set of production rules. The proposed method takes into account the weight of each decision tree and using a combination matrix generates a single set of simplified production rules with performance comparable to that of the original boosting ensemble. In order to measure the performance and prove the applicability of the proposed method, we carried out an empirical validation using three different boosting algorithms over several well-known machine learning datasets as well as real-life data collected from a manufacturing company. The results of the algorithm are acceptable in most of the experiments reducing the complexity of the boosting ensemble output while maintaining a similar performance.
AB - In the field of machine learning, the problem of imbalanced classification arises when the class percentage on the data is unevenly distributed. Different strategies using boosting ensemble algorithms have shown improved results over the imbalanced classification problem by combining weak learners to produce a single strong learner. In particular, decision trees are often used as base learners in ensemble learning for classification or regression. However, boosting ensemble algorithms sometimes generate a large number of decision trees that could grow too large to be understandable and interpretable. Additionally, the use of weights adds more complexity to the final result. For this reason, in this paper, we present RuleCOSI, a novel method for combining and simplifying the output of an ensemble of binary decision trees into a single set of production rules. The proposed method takes into account the weight of each decision tree and using a combination matrix generates a single set of simplified production rules with performance comparable to that of the original boosting ensemble. In order to measure the performance and prove the applicability of the proposed method, we carried out an empirical validation using three different boosting algorithms over several well-known machine learning datasets as well as real-life data collected from a manufacturing company. The results of the algorithm are acceptable in most of the experiments reducing the complexity of the boosting ensemble output while maintaining a similar performance.
KW - Boosting
KW - Decision trees
KW - Ensemble learning
KW - Imbalanced classification
KW - Rule extraction
UR - http://www.scopus.com/inward/record.url?scp=85061729685&partnerID=8YFLogxK
U2 - 10.1016/j.eswa.2019.02.012
DO - 10.1016/j.eswa.2019.02.012
M3 - Article
AN - SCOPUS:85061729685
SN - 0957-4174
VL - 126
SP - 64
EP - 82
JO - Expert Systems with Applications
JF - Expert Systems with Applications
ER -