Improved noise-filtering algorithm for AdaBoost using the inter-and intra-class variability of imbalanced datasets

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

Boosting methods are known to increase performance outcomes by using multiple learners connected sequentially. In particular, Adaptive boosting (AdaBoost) has been widely used owing to its comparatively improved predictive results for hard-to-learn samples based on misclassification costs. Each weak learner minimizes the expected risk by assigning high misclassification costs to suspect samples. The performance of AdaBoost depends on the distribution of noise samples because the algorithm tends to overfit noisy samples. Various studies have been conducted to address the noise sensitivity issue. Noise-filtering methods used in AdaBoost remove samples defined as noise based on the degree of misclassification to prevent overfitting to noisy samples. However, if the difference in the classification difficulty between classes is considerable, it is easy for samples from classes that are difficult to classify to be defined as noise. This situation is common with imbalanced datasets and can adversely affect performance outcomes. To solve this problem, this study proposes a new noise detection algorithm for AdaBoost that considers differences in the classification difficulty of classes and the characteristics of iteratively recalculated sample weight distributions. Experimental results on ten imbalanced datasets with various degrees of imbalanced ratios demonstrate that the proposed method defines noisy samples properly and improves the overall performance of AdaBoost.

Original languageEnglish
Pages (from-to)5035-5051
Number of pages17
JournalJournal of Intelligent and Fuzzy Systems
Volume43
Issue number4
DOIs
StatePublished - 2022

Keywords

  • AdaBoost
  • class imbalance
  • class separation
  • noise-filtering
  • noise-robust learning

Fingerprint

Dive into the research topics of 'Improved noise-filtering algorithm for AdaBoost using the inter-and intra-class variability of imbalanced datasets'. Together they form a unique fingerprint.

Cite this