ORKG-Leaderboards: a systematic workflow for mining leaderboards as a knowledge graph

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Autoren

Organisationseinheiten

Externe Organisationen

  • Technische Informationsbibliothek (TIB) Leibniz-Informationszentrum Technik und Naturwissenschaften und Universitätsbibliothek
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Seiten (von - bis)41-54
Seitenumfang14
FachzeitschriftInternational Journal on Digital Libraries
Jahrgang25
Ausgabenummer1
Frühes Online-Datum15 Juni 2023
PublikationsstatusVeröffentlicht - März 2024

Abstract

The purpose of this work is to describe the orkg-Leaderboard software designed to extract leaderboards defined as task–dataset–metric tuples automatically from large collections of empirical research papers in artificial intelligence (AI). The software can support both the main workflows of scholarly publishing, viz. as LaTeX files or as PDF files. Furthermore, the system is integrated with the open research knowledge graph (ORKG) platform, which fosters the machine-actionable publishing of scholarly findings. Thus, the systemsss output, when integrated within the ORKG’s supported Semantic Web infrastructure of representing machine-actionable ‘resources’ on the Web, enables: (1) broadly, the integration of empirical results of researchers across the world, thus enabling transparency in empirical research with the potential to also being complete contingent on the underlying data source(s) of publications; and (2) specifically, enables researchers to track the progress in AI with an overview of the state-of-the-art across the most common AI tasks and their corresponding datasets via dynamic ORKG frontend views leveraging tables and visualization charts over the machine-actionable data. Our best model achieves performances above 90% F1 on the leaderboard extraction task, thus proving orkg-Leaderboards a practically viable tool for real-world usage. Going forward, in a sense, orkg-Leaderboards transforms the leaderboard extraction task to an automated digitalization task, which has been, for a long time in the community, a crowdsourced endeavor.

ASJC Scopus Sachgebiete

Zitieren

ORKG-Leaderboards: a systematic workflow for mining leaderboards as a knowledge graph. / Kabongo, Salomon; D’Souza, Jennifer; Auer, Sören.
in: International Journal on Digital Libraries, Jahrgang 25, Nr. 1, 03.2024, S. 41-54.

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Kabongo S, D’Souza J, Auer S. ORKG-Leaderboards: a systematic workflow for mining leaderboards as a knowledge graph. International Journal on Digital Libraries. 2024 Mär;25(1):41-54. Epub 2023 Jun 15. doi: 10.48550/arXiv.2305.11068, 10.1007/s00799-023-00366-1, 10.1007/s00799-024-00405-5
Kabongo, Salomon ; D’Souza, Jennifer ; Auer, Sören. / ORKG-Leaderboards : a systematic workflow for mining leaderboards as a knowledge graph. in: International Journal on Digital Libraries. 2024 ; Jahrgang 25, Nr. 1. S. 41-54.
Download
@article{0cd19c58a8ac404f91a211ec59cd7741,
title = "ORKG-Leaderboards: a systematic workflow for mining leaderboards as a knowledge graph",
abstract = "The purpose of this work is to describe the orkg-Leaderboard software designed to extract leaderboards defined as task–dataset–metric tuples automatically from large collections of empirical research papers in artificial intelligence (AI). The software can support both the main workflows of scholarly publishing, viz. as LaTeX files or as PDF files. Furthermore, the system is integrated with the open research knowledge graph (ORKG) platform, which fosters the machine-actionable publishing of scholarly findings. Thus, the systemsss output, when integrated within the ORKG{\textquoteright}s supported Semantic Web infrastructure of representing machine-actionable {\textquoteleft}resources{\textquoteright} on the Web, enables: (1) broadly, the integration of empirical results of researchers across the world, thus enabling transparency in empirical research with the potential to also being complete contingent on the underlying data source(s) of publications; and (2) specifically, enables researchers to track the progress in AI with an overview of the state-of-the-art across the most common AI tasks and their corresponding datasets via dynamic ORKG frontend views leveraging tables and visualization charts over the machine-actionable data. Our best model achieves performances above 90% F1 on the leaderboard extraction task, thus proving orkg-Leaderboards a practically viable tool for real-world usage. Going forward, in a sense, orkg-Leaderboards transforms the leaderboard extraction task to an automated digitalization task, which has been, for a long time in the community, a crowdsourced endeavor.",
keywords = "Information extraction, Knowledge graphs, Neural machine learning, Scholarly text mining, Semantic networks, Table mining",
author = "Salomon Kabongo and Jennifer D{\textquoteright}Souza and S{\"o}ren Auer",
note = "Funding Information: This work was co-funded by the Federal Ministry of Education and Research (BMBF) of Germany for the project LeibnizKILabor (grant no. 01DD20003), BMBF project SCINEXT (GA ID: 01lS22070), NFDI4DataScience (grant no. 460234259) and by the European Research Council for the project ScienceGRAPH (Grant agreement ID: 819536). ",
year = "2024",
month = mar,
doi = "10.48550/arXiv.2305.11068",
language = "English",
volume = "25",
pages = "41--54",
number = "1",

}

Download

TY - JOUR

T1 - ORKG-Leaderboards

T2 - a systematic workflow for mining leaderboards as a knowledge graph

AU - Kabongo, Salomon

AU - D’Souza, Jennifer

AU - Auer, Sören

N1 - Funding Information: This work was co-funded by the Federal Ministry of Education and Research (BMBF) of Germany for the project LeibnizKILabor (grant no. 01DD20003), BMBF project SCINEXT (GA ID: 01lS22070), NFDI4DataScience (grant no. 460234259) and by the European Research Council for the project ScienceGRAPH (Grant agreement ID: 819536).

PY - 2024/3

Y1 - 2024/3

N2 - The purpose of this work is to describe the orkg-Leaderboard software designed to extract leaderboards defined as task–dataset–metric tuples automatically from large collections of empirical research papers in artificial intelligence (AI). The software can support both the main workflows of scholarly publishing, viz. as LaTeX files or as PDF files. Furthermore, the system is integrated with the open research knowledge graph (ORKG) platform, which fosters the machine-actionable publishing of scholarly findings. Thus, the systemsss output, when integrated within the ORKG’s supported Semantic Web infrastructure of representing machine-actionable ‘resources’ on the Web, enables: (1) broadly, the integration of empirical results of researchers across the world, thus enabling transparency in empirical research with the potential to also being complete contingent on the underlying data source(s) of publications; and (2) specifically, enables researchers to track the progress in AI with an overview of the state-of-the-art across the most common AI tasks and their corresponding datasets via dynamic ORKG frontend views leveraging tables and visualization charts over the machine-actionable data. Our best model achieves performances above 90% F1 on the leaderboard extraction task, thus proving orkg-Leaderboards a practically viable tool for real-world usage. Going forward, in a sense, orkg-Leaderboards transforms the leaderboard extraction task to an automated digitalization task, which has been, for a long time in the community, a crowdsourced endeavor.

AB - The purpose of this work is to describe the orkg-Leaderboard software designed to extract leaderboards defined as task–dataset–metric tuples automatically from large collections of empirical research papers in artificial intelligence (AI). The software can support both the main workflows of scholarly publishing, viz. as LaTeX files or as PDF files. Furthermore, the system is integrated with the open research knowledge graph (ORKG) platform, which fosters the machine-actionable publishing of scholarly findings. Thus, the systemsss output, when integrated within the ORKG’s supported Semantic Web infrastructure of representing machine-actionable ‘resources’ on the Web, enables: (1) broadly, the integration of empirical results of researchers across the world, thus enabling transparency in empirical research with the potential to also being complete contingent on the underlying data source(s) of publications; and (2) specifically, enables researchers to track the progress in AI with an overview of the state-of-the-art across the most common AI tasks and their corresponding datasets via dynamic ORKG frontend views leveraging tables and visualization charts over the machine-actionable data. Our best model achieves performances above 90% F1 on the leaderboard extraction task, thus proving orkg-Leaderboards a practically viable tool for real-world usage. Going forward, in a sense, orkg-Leaderboards transforms the leaderboard extraction task to an automated digitalization task, which has been, for a long time in the community, a crowdsourced endeavor.

KW - Information extraction

KW - Knowledge graphs

KW - Neural machine learning

KW - Scholarly text mining

KW - Semantic networks

KW - Table mining

UR - http://www.scopus.com/inward/record.url?scp=85162073003&partnerID=8YFLogxK

U2 - 10.48550/arXiv.2305.11068

DO - 10.48550/arXiv.2305.11068

M3 - Article

AN - SCOPUS:85162073003

VL - 25

SP - 41

EP - 54

JO - International Journal on Digital Libraries

JF - International Journal on Digital Libraries

SN - 1432-5012

IS - 1

ER -

Von denselben Autoren