Multi-label Node Classification On Graph-Structured Data

Tianqi Zhao; Ngan Thi Dong; Alan Hanjalic; Megha Khosla

doi:10.48550/arXiv.2304.10398

Details

Original language	English
Journal	Transactions on Machine Learning Research
Volume	2023
Publication status	E-pub ahead of print - 29 Feb 2024

Abstract

Graph Neural Networks (GNNs) have shown state-of-the-art improvements in node classification tasks on graphs. While these improvements have been largely demonstrated in a multi-class classification scenario, a more general and realistic scenario in which each node could have multiple labels has so far received little attention. The first challenge in conducting focused studies on multi-label node classification is the limited number of publicly available multi-label graph datasets. Therefore, as our first contribution, we collect and release three real-world biological datasets and develop a multi-label graph generator to generate datasets with tunable properties. While high label similarity (high homophily) is usually attributed to the success of GNNs, we argue that a multi-label scenario does not follow the usual semantics of homophily and heterophily so far defined for a multi-class scenario. As our second contribution, we define homophily and Cross-Class Neighborhood Similarity for the multi-label scenario and provide a thorough analyses of the collected 9 multi-label datasets. Finally, we perform a large-scale comparative study with 8 methods and 9 datasets and analyse the performances of the methods to assess the progress made by current state of the art in the multi-label node classification scenario. We release our benchmark at https://github.com/Tianqi-py/MLGNC.

ASJC Scopus subject areas

Computer Science(all)
Artificial Intelligence
Computer Science(all)
Computer Vision and Pattern Recognition

Cite this

Multi-label Node Classification On Graph-Structured Data. / Zhao, Tianqi; Dong, Ngan Thi; Hanjalic, Alan et al.
In: Transactions on Machine Learning Research, Vol. 2023, 29.02.2024.

Research output: Contribution to journal › Article › Research › peer review

Zhao, T, Dong, NT, Hanjalic, A & Khosla, M 2024, 'Multi-label Node Classification On Graph-Structured Data', Transactions on Machine Learning Research, vol. 2023. https://doi.org/10.48550/arXiv.2304.10398

Zhao, T., Dong, N. T., Hanjalic, A., & Khosla, M. (2024). Multi-label Node Classification On Graph-Structured Data. Transactions on Machine Learning Research, 2023. Advance online publication. https://doi.org/10.48550/arXiv.2304.10398

Zhao T, Dong NT, Hanjalic A, Khosla M. Multi-label Node Classification On Graph-Structured Data. Transactions on Machine Learning Research. 2024 Feb 29;2023. Epub 2024 Feb 29. doi: 10.48550/arXiv.2304.10398

Zhao, Tianqi ; Dong, Ngan Thi ; Hanjalic, Alan et al. / Multi-label Node Classification On Graph-Structured Data. In: Transactions on Machine Learning Research. 2024 ; Vol. 2023.

Download

@article{9ff20948353640909e28bb12f12cbc27,

title = "Multi-label Node Classification On Graph-Structured Data",

abstract = "Graph Neural Networks (GNNs) have shown state-of-the-art improvements in node classification tasks on graphs. While these improvements have been largely demonstrated in a multi-class classification scenario, a more general and realistic scenario in which each node could have multiple labels has so far received little attention. The first challenge in conducting focused studies on multi-label node classification is the limited number of publicly available multi-label graph datasets. Therefore, as our first contribution, we collect and release three real-world biological datasets and develop a multi-label graph generator to generate datasets with tunable properties. While high label similarity (high homophily) is usually attributed to the success of GNNs, we argue that a multi-label scenario does not follow the usual semantics of homophily and heterophily so far defined for a multi-class scenario. As our second contribution, we define homophily and Cross-Class Neighborhood Similarity for the multi-label scenario and provide a thorough analyses of the collected 9 multi-label datasets. Finally, we perform a large-scale comparative study with 8 methods and 9 datasets and analyse the performances of the methods to assess the progress made by current state of the art in the multi-label node classification scenario. We release our benchmark at https://github.com/Tianqi-py/MLGNC.",

author = "Tianqi Zhao and Dong, {Ngan Thi} and Alan Hanjalic and Megha Khosla",

year = "2024",

month = feb,

day = "29",

doi = "10.48550/arXiv.2304.10398",

language = "English",

volume = "2023",

}

Download

TY - JOUR

T1 - Multi-label Node Classification On Graph-Structured Data

AU - Zhao, Tianqi

AU - Dong, Ngan Thi

AU - Hanjalic, Alan

AU - Khosla, Megha

PY - 2024/2/29

Y1 - 2024/2/29

N2 - Graph Neural Networks (GNNs) have shown state-of-the-art improvements in node classification tasks on graphs. While these improvements have been largely demonstrated in a multi-class classification scenario, a more general and realistic scenario in which each node could have multiple labels has so far received little attention. The first challenge in conducting focused studies on multi-label node classification is the limited number of publicly available multi-label graph datasets. Therefore, as our first contribution, we collect and release three real-world biological datasets and develop a multi-label graph generator to generate datasets with tunable properties. While high label similarity (high homophily) is usually attributed to the success of GNNs, we argue that a multi-label scenario does not follow the usual semantics of homophily and heterophily so far defined for a multi-class scenario. As our second contribution, we define homophily and Cross-Class Neighborhood Similarity for the multi-label scenario and provide a thorough analyses of the collected 9 multi-label datasets. Finally, we perform a large-scale comparative study with 8 methods and 9 datasets and analyse the performances of the methods to assess the progress made by current state of the art in the multi-label node classification scenario. We release our benchmark at https://github.com/Tianqi-py/MLGNC.

AB - Graph Neural Networks (GNNs) have shown state-of-the-art improvements in node classification tasks on graphs. While these improvements have been largely demonstrated in a multi-class classification scenario, a more general and realistic scenario in which each node could have multiple labels has so far received little attention. The first challenge in conducting focused studies on multi-label node classification is the limited number of publicly available multi-label graph datasets. Therefore, as our first contribution, we collect and release three real-world biological datasets and develop a multi-label graph generator to generate datasets with tunable properties. While high label similarity (high homophily) is usually attributed to the success of GNNs, we argue that a multi-label scenario does not follow the usual semantics of homophily and heterophily so far defined for a multi-class scenario. As our second contribution, we define homophily and Cross-Class Neighborhood Similarity for the multi-label scenario and provide a thorough analyses of the collected 9 multi-label datasets. Finally, we perform a large-scale comparative study with 8 methods and 9 datasets and analyse the performances of the methods to assess the progress made by current state of the art in the multi-label node classification scenario. We release our benchmark at https://github.com/Tianqi-py/MLGNC.

U2 - 10.48550/arXiv.2304.10398

DO - 10.48550/arXiv.2304.10398

M3 - Article

AN - SCOPUS:86000169387

VL - 2023

JO - Transactions on Machine Learning Research

JF - Transactions on Machine Learning Research

SN - 2835-8856

ER -

Research@Leibniz University

Multi-label Node Classification On Graph-Structured Data

Authors

Research Organisations

External Research Organisations

Details

Abstract

ASJC Scopus subject areas

Cite this