Details
Original language | English |
---|---|
Pages (from-to) | 789-826 |
Number of pages | 38 |
Journal | Knowledge and information systems |
Volume | 65 |
Issue number | 2 |
Early online date | 2 Nov 2022 |
Publication status | Published - Feb 2023 |
Abstract
Class imbalance poses a major challenge for machine learning as most supervised learning models might exhibit bias towards the majority class and under-perform in the minority class. Cost-sensitive learning tackles this problem by treating the classes differently, formulated typically via a user-defined fixed misclassification cost matrix provided as input to the learner. Such parameter tuning is a challenging task that requires domain knowledge and moreover, wrong adjustments might lead to overall predictive performance deterioration. In this work, we propose a novel cost-sensitive boosting approach for imbalanced data that dynamically adjusts the misclassification costs over the boosting rounds in response to model’s performance instead of using a fixed misclassification cost matrix. Our method, called AdaCC, is parameter-free as it relies on the cumulative behavior of the boosting model in order to adjust the misclassification costs for the next boosting round and comes with theoretical guarantees regarding the training error. Experiments on 27 real-world datasets from different domains with high class imbalance demonstrate the superiority of our method over 12 state-of-the-art cost-sensitive boosting approaches exhibiting consistent improvements in different measures, for instance, in the range of [0.3–28.56%] for AUC, [3.4–21.4%] for balanced accuracy, [4.8–45%] for gmean and [7.4–85.5%] for recall.
Keywords
- Boosting, Class imbalance, Cost-sensitive learning, Cumulative costs, Dynamic costs
ASJC Scopus subject areas
- Computer Science(all)
- Software
- Computer Science(all)
- Information Systems
- Computer Science(all)
- Human-Computer Interaction
- Computer Science(all)
- Hardware and Architecture
- Computer Science(all)
- Artificial Intelligence
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
In: Knowledge and information systems, Vol. 65, No. 2, 02.2023, p. 789-826.
Research output: Contribution to journal › Article › Research › peer review
}
TY - JOUR
T1 - AdaCC
T2 - cumulative cost-sensitive boosting for imbalanced classification
AU - Iosifidis, Vasileios
AU - Papadopoulos, Symeon
AU - Rosenhahn, Bodo
AU - Ntoutsi, Eirini
N1 - Funding Information: Open Access funding enabled and organized by Projekt DEAL.
PY - 2023/2
Y1 - 2023/2
N2 - Class imbalance poses a major challenge for machine learning as most supervised learning models might exhibit bias towards the majority class and under-perform in the minority class. Cost-sensitive learning tackles this problem by treating the classes differently, formulated typically via a user-defined fixed misclassification cost matrix provided as input to the learner. Such parameter tuning is a challenging task that requires domain knowledge and moreover, wrong adjustments might lead to overall predictive performance deterioration. In this work, we propose a novel cost-sensitive boosting approach for imbalanced data that dynamically adjusts the misclassification costs over the boosting rounds in response to model’s performance instead of using a fixed misclassification cost matrix. Our method, called AdaCC, is parameter-free as it relies on the cumulative behavior of the boosting model in order to adjust the misclassification costs for the next boosting round and comes with theoretical guarantees regarding the training error. Experiments on 27 real-world datasets from different domains with high class imbalance demonstrate the superiority of our method over 12 state-of-the-art cost-sensitive boosting approaches exhibiting consistent improvements in different measures, for instance, in the range of [0.3–28.56%] for AUC, [3.4–21.4%] for balanced accuracy, [4.8–45%] for gmean and [7.4–85.5%] for recall.
AB - Class imbalance poses a major challenge for machine learning as most supervised learning models might exhibit bias towards the majority class and under-perform in the minority class. Cost-sensitive learning tackles this problem by treating the classes differently, formulated typically via a user-defined fixed misclassification cost matrix provided as input to the learner. Such parameter tuning is a challenging task that requires domain knowledge and moreover, wrong adjustments might lead to overall predictive performance deterioration. In this work, we propose a novel cost-sensitive boosting approach for imbalanced data that dynamically adjusts the misclassification costs over the boosting rounds in response to model’s performance instead of using a fixed misclassification cost matrix. Our method, called AdaCC, is parameter-free as it relies on the cumulative behavior of the boosting model in order to adjust the misclassification costs for the next boosting round and comes with theoretical guarantees regarding the training error. Experiments on 27 real-world datasets from different domains with high class imbalance demonstrate the superiority of our method over 12 state-of-the-art cost-sensitive boosting approaches exhibiting consistent improvements in different measures, for instance, in the range of [0.3–28.56%] for AUC, [3.4–21.4%] for balanced accuracy, [4.8–45%] for gmean and [7.4–85.5%] for recall.
KW - Boosting
KW - Class imbalance
KW - Cost-sensitive learning
KW - Cumulative costs
KW - Dynamic costs
UR - http://www.scopus.com/inward/record.url?scp=85141176623&partnerID=8YFLogxK
U2 - 10.48550/arXiv.2209.08309
DO - 10.48550/arXiv.2209.08309
M3 - Article
AN - SCOPUS:85141176623
VL - 65
SP - 789
EP - 826
JO - Knowledge and information systems
JF - Knowledge and information systems
SN - 0219-1377
IS - 2
ER -