Loading [MathJax]/extensions/tex2jax.js

Multi-label Node Classification On Graph-Structured Data

Research output: Contribution to journalArticleResearchpeer review

Authors

  • Tianqi Zhao
  • Ngan Thi Dong
  • Alan Hanjalic
  • Megha Khosla

Research Organisations

External Research Organisations

  • Delft University of Technology

Details

Original languageEnglish
JournalTransactions on Machine Learning Research
Volume2023
Publication statusE-pub ahead of print - 29 Feb 2024

Abstract

Graph Neural Networks (GNNs) have shown state-of-the-art improvements in node classification tasks on graphs. While these improvements have been largely demonstrated in a multi-class classification scenario, a more general and realistic scenario in which each node could have multiple labels has so far received little attention. The first challenge in conducting focused studies on multi-label node classification is the limited number of publicly available multi-label graph datasets. Therefore, as our first contribution, we collect and release three real-world biological datasets and develop a multi-label graph generator to generate datasets with tunable properties. While high label similarity (high homophily) is usually attributed to the success of GNNs, we argue that a multi-label scenario does not follow the usual semantics of homophily and heterophily so far defined for a multi-class scenario. As our second contribution, we define homophily and Cross-Class Neighborhood Similarity for the multi-label scenario and provide a thorough analyses of the collected 9 multi-label datasets. Finally, we perform a large-scale comparative study with 8 methods and 9 datasets and analyse the performances of the methods to assess the progress made by current state of the art in the multi-label node classification scenario. We release our benchmark at https://github.com/Tianqi-py/MLGNC.

ASJC Scopus subject areas

Cite this

Multi-label Node Classification On Graph-Structured Data. / Zhao, Tianqi; Dong, Ngan Thi; Hanjalic, Alan et al.
In: Transactions on Machine Learning Research, Vol. 2023, 29.02.2024.

Research output: Contribution to journalArticleResearchpeer review

Zhao, T, Dong, NT, Hanjalic, A & Khosla, M 2024, 'Multi-label Node Classification On Graph-Structured Data', Transactions on Machine Learning Research, vol. 2023. https://doi.org/10.48550/arXiv.2304.10398
Zhao, T., Dong, N. T., Hanjalic, A., & Khosla, M. (2024). Multi-label Node Classification On Graph-Structured Data. Transactions on Machine Learning Research, 2023. Advance online publication. https://doi.org/10.48550/arXiv.2304.10398
Zhao T, Dong NT, Hanjalic A, Khosla M. Multi-label Node Classification On Graph-Structured Data. Transactions on Machine Learning Research. 2024 Feb 29;2023. Epub 2024 Feb 29. doi: 10.48550/arXiv.2304.10398
Zhao, Tianqi ; Dong, Ngan Thi ; Hanjalic, Alan et al. / Multi-label Node Classification On Graph-Structured Data. In: Transactions on Machine Learning Research. 2024 ; Vol. 2023.
Download
@article{9ff20948353640909e28bb12f12cbc27,
title = "Multi-label Node Classification On Graph-Structured Data",
abstract = "Graph Neural Networks (GNNs) have shown state-of-the-art improvements in node classification tasks on graphs. While these improvements have been largely demonstrated in a multi-class classification scenario, a more general and realistic scenario in which each node could have multiple labels has so far received little attention. The first challenge in conducting focused studies on multi-label node classification is the limited number of publicly available multi-label graph datasets. Therefore, as our first contribution, we collect and release three real-world biological datasets and develop a multi-label graph generator to generate datasets with tunable properties. While high label similarity (high homophily) is usually attributed to the success of GNNs, we argue that a multi-label scenario does not follow the usual semantics of homophily and heterophily so far defined for a multi-class scenario. As our second contribution, we define homophily and Cross-Class Neighborhood Similarity for the multi-label scenario and provide a thorough analyses of the collected 9 multi-label datasets. Finally, we perform a large-scale comparative study with 8 methods and 9 datasets and analyse the performances of the methods to assess the progress made by current state of the art in the multi-label node classification scenario. We release our benchmark at https://github.com/Tianqi-py/MLGNC.",
author = "Tianqi Zhao and Dong, {Ngan Thi} and Alan Hanjalic and Megha Khosla",
note = "Publisher Copyright: {\textcopyright} 2023, Transactions on Machine Learning Research. All rights reserved.",
year = "2024",
month = feb,
day = "29",
doi = "10.48550/arXiv.2304.10398",
language = "English",
volume = "2023",

}

Download

TY - JOUR

T1 - Multi-label Node Classification On Graph-Structured Data

AU - Zhao, Tianqi

AU - Dong, Ngan Thi

AU - Hanjalic, Alan

AU - Khosla, Megha

N1 - Publisher Copyright: © 2023, Transactions on Machine Learning Research. All rights reserved.

PY - 2024/2/29

Y1 - 2024/2/29

N2 - Graph Neural Networks (GNNs) have shown state-of-the-art improvements in node classification tasks on graphs. While these improvements have been largely demonstrated in a multi-class classification scenario, a more general and realistic scenario in which each node could have multiple labels has so far received little attention. The first challenge in conducting focused studies on multi-label node classification is the limited number of publicly available multi-label graph datasets. Therefore, as our first contribution, we collect and release three real-world biological datasets and develop a multi-label graph generator to generate datasets with tunable properties. While high label similarity (high homophily) is usually attributed to the success of GNNs, we argue that a multi-label scenario does not follow the usual semantics of homophily and heterophily so far defined for a multi-class scenario. As our second contribution, we define homophily and Cross-Class Neighborhood Similarity for the multi-label scenario and provide a thorough analyses of the collected 9 multi-label datasets. Finally, we perform a large-scale comparative study with 8 methods and 9 datasets and analyse the performances of the methods to assess the progress made by current state of the art in the multi-label node classification scenario. We release our benchmark at https://github.com/Tianqi-py/MLGNC.

AB - Graph Neural Networks (GNNs) have shown state-of-the-art improvements in node classification tasks on graphs. While these improvements have been largely demonstrated in a multi-class classification scenario, a more general and realistic scenario in which each node could have multiple labels has so far received little attention. The first challenge in conducting focused studies on multi-label node classification is the limited number of publicly available multi-label graph datasets. Therefore, as our first contribution, we collect and release three real-world biological datasets and develop a multi-label graph generator to generate datasets with tunable properties. While high label similarity (high homophily) is usually attributed to the success of GNNs, we argue that a multi-label scenario does not follow the usual semantics of homophily and heterophily so far defined for a multi-class scenario. As our second contribution, we define homophily and Cross-Class Neighborhood Similarity for the multi-label scenario and provide a thorough analyses of the collected 9 multi-label datasets. Finally, we perform a large-scale comparative study with 8 methods and 9 datasets and analyse the performances of the methods to assess the progress made by current state of the art in the multi-label node classification scenario. We release our benchmark at https://github.com/Tianqi-py/MLGNC.

U2 - 10.48550/arXiv.2304.10398

DO - 10.48550/arXiv.2304.10398

M3 - Article

AN - SCOPUS:86000169387

VL - 2023

JO - Transactions on Machine Learning Research

JF - Transactions on Machine Learning Research

SN - 2835-8856

ER -