AdaCC: cumulative cost-sensitive boosting for imbalanced classification

Vasileios Iosifidis; Symeon Papadopoulos; Bodo Rosenhahn; Eirini Ntoutsi

doi:10.48550/arXiv.2209.08309

Details

Originalsprache	Englisch
Seiten (von - bis)	789-826
Seitenumfang	38
Fachzeitschrift	Knowledge and information systems
Jahrgang	65
Ausgabenummer	2
Frühes Online-Datum	2 Nov. 2022
Publikationsstatus	Veröffentlicht - Feb. 2023

Abstract

Class imbalance poses a major challenge for machine learning as most supervised learning models might exhibit bias towards the majority class and under-perform in the minority class. Cost-sensitive learning tackles this problem by treating the classes differently, formulated typically via a user-defined fixed misclassification cost matrix provided as input to the learner. Such parameter tuning is a challenging task that requires domain knowledge and moreover, wrong adjustments might lead to overall predictive performance deterioration. In this work, we propose a novel cost-sensitive boosting approach for imbalanced data that dynamically adjusts the misclassification costs over the boosting rounds in response to model’s performance instead of using a fixed misclassification cost matrix. Our method, called AdaCC, is parameter-free as it relies on the cumulative behavior of the boosting model in order to adjust the misclassification costs for the next boosting round and comes with theoretical guarantees regarding the training error. Experiments on 27 real-world datasets from different domains with high class imbalance demonstrate the superiority of our method over 12 state-of-the-art cost-sensitive boosting approaches exhibiting consistent improvements in different measures, for instance, in the range of [0.3–28.56%] for AUC, [3.4–21.4%] for balanced accuracy, [4.8–45%] for gmean and [7.4–85.5%] for recall.

ASJC Scopus Sachgebiete

Informatik (insg.)
Software
Informatik (insg.)
Information systems
Informatik (insg.)
Mensch-Maschine-Interaktion
Informatik (insg.)
Hardware und Architektur
Informatik (insg.)
Artificial intelligence

Zitieren

AdaCC: cumulative cost-sensitive boosting for imbalanced classification. / Iosifidis, Vasileios; Papadopoulos, Symeon; Rosenhahn, Bodo et al.
in: Knowledge and information systems, Jahrgang 65, Nr. 2, 02.2023, S. 789-826.

Publikation: Beitrag in Fachzeitschrift › Artikel › Forschung › Peer-Review

Iosifidis, V, Papadopoulos, S, Rosenhahn, B & Ntoutsi, E 2023, 'AdaCC: cumulative cost-sensitive boosting for imbalanced classification', Knowledge and information systems, Jg. 65, Nr. 2, S. 789-826. https://doi.org/10.48550/arXiv.2209.08309, https://doi.org/10.1007/s10115-022-01780-8

Iosifidis, V., Papadopoulos, S., Rosenhahn, B., & Ntoutsi, E. (2023). AdaCC: cumulative cost-sensitive boosting for imbalanced classification. Knowledge and information systems, 65(2), 789-826. https://doi.org/10.48550/arXiv.2209.08309, https://doi.org/10.1007/s10115-022-01780-8

Iosifidis V, Papadopoulos S, Rosenhahn B, Ntoutsi E. AdaCC: cumulative cost-sensitive boosting for imbalanced classification. Knowledge and information systems. 2023 Feb;65(2):789-826. Epub 2022 Nov 2. doi: 10.48550/arXiv.2209.08309, 10.1007/s10115-022-01780-8

Iosifidis, Vasileios ; Papadopoulos, Symeon ; Rosenhahn, Bodo et al. / AdaCC : cumulative cost-sensitive boosting for imbalanced classification. in: Knowledge and information systems. 2023 ; Jahrgang 65, Nr. 2. S. 789-826.

Download

@article{26a9f4e3bd0946cf82301b102ffa958c,

title = "AdaCC: cumulative cost-sensitive boosting for imbalanced classification",

abstract = "Class imbalance poses a major challenge for machine learning as most supervised learning models might exhibit bias towards the majority class and under-perform in the minority class. Cost-sensitive learning tackles this problem by treating the classes differently, formulated typically via a user-defined fixed misclassification cost matrix provided as input to the learner. Such parameter tuning is a challenging task that requires domain knowledge and moreover, wrong adjustments might lead to overall predictive performance deterioration. In this work, we propose a novel cost-sensitive boosting approach for imbalanced data that dynamically adjusts the misclassification costs over the boosting rounds in response to model{\textquoteright}s performance instead of using a fixed misclassification cost matrix. Our method, called AdaCC, is parameter-free as it relies on the cumulative behavior of the boosting model in order to adjust the misclassification costs for the next boosting round and comes with theoretical guarantees regarding the training error. Experiments on 27 real-world datasets from different domains with high class imbalance demonstrate the superiority of our method over 12 state-of-the-art cost-sensitive boosting approaches exhibiting consistent improvements in different measures, for instance, in the range of [0.3–28.56%] for AUC, [3.4–21.4%] for balanced accuracy, [4.8–45%] for gmean and [7.4–85.5%] for recall.",

keywords = "Boosting, Class imbalance, Cost-sensitive learning, Cumulative costs, Dynamic costs",

author = "Vasileios Iosifidis and Symeon Papadopoulos and Bodo Rosenhahn and Eirini Ntoutsi",

note = "Funding Information: Open Access funding enabled and organized by Projekt DEAL. ",

year = "2023",

month = feb,

doi = "10.48550/arXiv.2209.08309",

language = "English",

volume = "65",

pages = "789--826",

journal = "Knowledge and information systems",

issn = "0219-1377",

publisher = "Springer London",

number = "2",

}

Download

TY - JOUR

T1 - AdaCC

T2 - cumulative cost-sensitive boosting for imbalanced classification

AU - Iosifidis, Vasileios

AU - Papadopoulos, Symeon

AU - Rosenhahn, Bodo

AU - Ntoutsi, Eirini

N1 - Funding Information: Open Access funding enabled and organized by Projekt DEAL.

PY - 2023/2

Y1 - 2023/2

N2 - Class imbalance poses a major challenge for machine learning as most supervised learning models might exhibit bias towards the majority class and under-perform in the minority class. Cost-sensitive learning tackles this problem by treating the classes differently, formulated typically via a user-defined fixed misclassification cost matrix provided as input to the learner. Such parameter tuning is a challenging task that requires domain knowledge and moreover, wrong adjustments might lead to overall predictive performance deterioration. In this work, we propose a novel cost-sensitive boosting approach for imbalanced data that dynamically adjusts the misclassification costs over the boosting rounds in response to model’s performance instead of using a fixed misclassification cost matrix. Our method, called AdaCC, is parameter-free as it relies on the cumulative behavior of the boosting model in order to adjust the misclassification costs for the next boosting round and comes with theoretical guarantees regarding the training error. Experiments on 27 real-world datasets from different domains with high class imbalance demonstrate the superiority of our method over 12 state-of-the-art cost-sensitive boosting approaches exhibiting consistent improvements in different measures, for instance, in the range of [0.3–28.56%] for AUC, [3.4–21.4%] for balanced accuracy, [4.8–45%] for gmean and [7.4–85.5%] for recall.

AB - Class imbalance poses a major challenge for machine learning as most supervised learning models might exhibit bias towards the majority class and under-perform in the minority class. Cost-sensitive learning tackles this problem by treating the classes differently, formulated typically via a user-defined fixed misclassification cost matrix provided as input to the learner. Such parameter tuning is a challenging task that requires domain knowledge and moreover, wrong adjustments might lead to overall predictive performance deterioration. In this work, we propose a novel cost-sensitive boosting approach for imbalanced data that dynamically adjusts the misclassification costs over the boosting rounds in response to model’s performance instead of using a fixed misclassification cost matrix. Our method, called AdaCC, is parameter-free as it relies on the cumulative behavior of the boosting model in order to adjust the misclassification costs for the next boosting round and comes with theoretical guarantees regarding the training error. Experiments on 27 real-world datasets from different domains with high class imbalance demonstrate the superiority of our method over 12 state-of-the-art cost-sensitive boosting approaches exhibiting consistent improvements in different measures, for instance, in the range of [0.3–28.56%] for AUC, [3.4–21.4%] for balanced accuracy, [4.8–45%] for gmean and [7.4–85.5%] for recall.

KW - Boosting

KW - Class imbalance

KW - Cost-sensitive learning

KW - Cumulative costs

KW - Dynamic costs

UR - http://www.scopus.com/inward/record.url?scp=85141176623&partnerID=8YFLogxK

U2 - 10.48550/arXiv.2209.08309

DO - 10.48550/arXiv.2209.08309

M3 - Article

AN - SCOPUS:85141176623

VL - 65

SP - 789

EP - 826

JO - Knowledge and information systems

JF - Knowledge and information systems

SN - 0219-1377

IS - 2

ER -

Research@Leibniz University

AdaCC: cumulative cost-sensitive boosting for imbalanced classification

Autorschaft

Organisationseinheiten

Externe Organisationen

Details

Abstract

ASJC Scopus Sachgebiete

Zitieren

Von denselben Autoren

Robust Shape Fitting for 3D Scene Abstraction

Quantum normalizing flows for anomaly detection

A variational autoencoder trained with priors from canonical pathways increases the interpretability of transcriptome data

Segment Any Object Model (SAOM): Real-To-Simulation Fine-Tuning Strategy For Multi-Class Multi-Instance Segmentation

Indoor Scene Change Understanding (SCU): Segment, Describe, and Revert Any Change

Robust Shape Fitting for 3D Scene Abstraction

Quantum normalizing flows for anomaly detection

A variational autoencoder trained with priors from canonical pathways increases the interpretability of transcriptome data

Segment Any Object Model (SAOM): Real-To-Simulation Fine-Tuning Strategy For Multi-Class Multi-Instance Segmentation

Indoor Scene Change Understanding (SCU): Segment, Describe, and Revert Any Change

Robust Shape Fitting for 3D Scene Abstraction