Details
Original language | English |
---|---|
Journal | Transactions on Machine Learning Research |
Volume | 2023 |
Publication status | E-pub ahead of print - 29 Feb 2024 |
Abstract
Graph Neural Networks (GNNs) have shown state-of-the-art improvements in node classification tasks on graphs. While these improvements have been largely demonstrated in a multi-class classification scenario, a more general and realistic scenario in which each node could have multiple labels has so far received little attention. The first challenge in conducting focused studies on multi-label node classification is the limited number of publicly available multi-label graph datasets. Therefore, as our first contribution, we collect and release three real-world biological datasets and develop a multi-label graph generator to generate datasets with tunable properties. While high label similarity (high homophily) is usually attributed to the success of GNNs, we argue that a multi-label scenario does not follow the usual semantics of homophily and heterophily so far defined for a multi-class scenario. As our second contribution, we define homophily and Cross-Class Neighborhood Similarity for the multi-label scenario and provide a thorough analyses of the collected 9 multi-label datasets. Finally, we perform a large-scale comparative study with 8 methods and 9 datasets and analyse the performances of the methods to assess the progress made by current state of the art in the multi-label node classification scenario. We release our benchmark at https://github.com/Tianqi-py/MLGNC.
ASJC Scopus subject areas
- Computer Science(all)
- Artificial Intelligence
- Computer Science(all)
- Computer Vision and Pattern Recognition
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
In: Transactions on Machine Learning Research, Vol. 2023, 29.02.2024.
Research output: Contribution to journal › Article › Research › peer review
}
TY - JOUR
T1 - Multi-label Node Classification On Graph-Structured Data
AU - Zhao, Tianqi
AU - Dong, Ngan Thi
AU - Hanjalic, Alan
AU - Khosla, Megha
N1 - Publisher Copyright: © 2023, Transactions on Machine Learning Research. All rights reserved.
PY - 2024/2/29
Y1 - 2024/2/29
N2 - Graph Neural Networks (GNNs) have shown state-of-the-art improvements in node classification tasks on graphs. While these improvements have been largely demonstrated in a multi-class classification scenario, a more general and realistic scenario in which each node could have multiple labels has so far received little attention. The first challenge in conducting focused studies on multi-label node classification is the limited number of publicly available multi-label graph datasets. Therefore, as our first contribution, we collect and release three real-world biological datasets and develop a multi-label graph generator to generate datasets with tunable properties. While high label similarity (high homophily) is usually attributed to the success of GNNs, we argue that a multi-label scenario does not follow the usual semantics of homophily and heterophily so far defined for a multi-class scenario. As our second contribution, we define homophily and Cross-Class Neighborhood Similarity for the multi-label scenario and provide a thorough analyses of the collected 9 multi-label datasets. Finally, we perform a large-scale comparative study with 8 methods and 9 datasets and analyse the performances of the methods to assess the progress made by current state of the art in the multi-label node classification scenario. We release our benchmark at https://github.com/Tianqi-py/MLGNC.
AB - Graph Neural Networks (GNNs) have shown state-of-the-art improvements in node classification tasks on graphs. While these improvements have been largely demonstrated in a multi-class classification scenario, a more general and realistic scenario in which each node could have multiple labels has so far received little attention. The first challenge in conducting focused studies on multi-label node classification is the limited number of publicly available multi-label graph datasets. Therefore, as our first contribution, we collect and release three real-world biological datasets and develop a multi-label graph generator to generate datasets with tunable properties. While high label similarity (high homophily) is usually attributed to the success of GNNs, we argue that a multi-label scenario does not follow the usual semantics of homophily and heterophily so far defined for a multi-class scenario. As our second contribution, we define homophily and Cross-Class Neighborhood Similarity for the multi-label scenario and provide a thorough analyses of the collected 9 multi-label datasets. Finally, we perform a large-scale comparative study with 8 methods and 9 datasets and analyse the performances of the methods to assess the progress made by current state of the art in the multi-label node classification scenario. We release our benchmark at https://github.com/Tianqi-py/MLGNC.
U2 - 10.48550/arXiv.2304.10398
DO - 10.48550/arXiv.2304.10398
M3 - Article
AN - SCOPUS:86000169387
VL - 2023
JO - Transactions on Machine Learning Research
JF - Transactions on Machine Learning Research
SN - 2835-8856
ER -