Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph

Gollam Rabby; Jennifer D’Souza; Allard Oelen; Lucie Dvorackova; Vojtěch Svátek; Sören Auer

doi:10.1186/s13326-023-00298-4

Details

Originalsprache	Englisch
Aufsatznummer	18
Seitenumfang	19
Fachzeitschrift	Journal of biomedical semantics
Jahrgang	14
Publikationsstatus	Veröffentlicht - 28 Nov. 2023

Abstract

Multiple studies have investigated bibliometric features and uncategorized scholarly documents for the influential scholarly document prediction task. In this paper, we describe our work that attempts to go beyond bibliometric metadata to predict influential scholarly documents. Furthermore, this work also examines the influential scholarly document prediction task over categorized scholarly documents. We also introduce a new approach to enhance the document representation method with a domain-independent knowledge graph to find the influential scholarly document using categorized scholarly content. As the input collection, we use the WHO corpus with scholarly documents on the theme of COVID-19. This study examines different document representation methods for machine learning, including TF-IDF, BOW, and embedding-based language models (BERT). The TF-IDF document representation method works better than others. From various machine learning methods tested, logistic regression outperformed the other for scholarly document category classification, and the random forest algorithm obtained the best results for influential scholarly document prediction, with the help of a domain-independent knowledge graph, specifically DBpedia, to enhance the document representation method for predicting influential scholarly documents with categorical scholarly content. In this case, our study combines state-of-the-art machine learning methods with the BOW document representation method. We also enhance the BOW document representation with the direct type (RDF type) and unqualified relation from DBpedia. From this experiment, we did not find any impact of the enhanced document representation for the scholarly document category classification. We found an effect in the influential scholarly document prediction with categorical data.

ASJC Scopus Sachgebiete

Informatik (insg.)
Information systems
Informatik (insg.)
Angewandte Informatik
Medizin (insg.)
Gesundheitsinformatik
Informatik (insg.)
Computernetzwerke und -kommunikation

Zitieren

Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph. / Rabby, Gollam; D’Souza, Jennifer; Oelen, Allard et al.
in: Journal of biomedical semantics, Jahrgang 14, 18, 28.11.2023.

Publikation: Beitrag in Fachzeitschrift › Artikel › Forschung › Peer-Review

Rabby, G, D’Souza, J, Oelen, A, Dvorackova, L, Svátek, V & Auer, S 2023, 'Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph', Journal of biomedical semantics, Jg. 14, 18. https://doi.org/10.1186/s13326-023-00298-4

Rabby, G., D’Souza, J., Oelen, A., Dvorackova, L., Svátek, V., & Auer, S. (2023). Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph. Journal of biomedical semantics, 14, Artikel 18. https://doi.org/10.1186/s13326-023-00298-4

Rabby G, D’Souza J, Oelen A, Dvorackova L, Svátek V, Auer S. Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph. Journal of biomedical semantics. 2023 Nov 28;14:18. doi: 10.1186/s13326-023-00298-4

Rabby, Gollam ; D’Souza, Jennifer ; Oelen, Allard et al. / Impact of COVID-19 research : a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph. in: Journal of biomedical semantics. 2023 ; Jahrgang 14.

Download

@article{709c4c7af5044cf7aae0e9ee58732b2e,

title = "Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph",

abstract = "Multiple studies have investigated bibliometric features and uncategorized scholarly documents for the influential scholarly document prediction task. In this paper, we describe our work that attempts to go beyond bibliometric metadata to predict influential scholarly documents. Furthermore, this work also examines the influential scholarly document prediction task over categorized scholarly documents. We also introduce a new approach to enhance the document representation method with a domain-independent knowledge graph to find the influential scholarly document using categorized scholarly content. As the input collection, we use the WHO corpus with scholarly documents on the theme of COVID-19. This study examines different document representation methods for machine learning, including TF-IDF, BOW, and embedding-based language models (BERT). The TF-IDF document representation method works better than others. From various machine learning methods tested, logistic regression outperformed the other for scholarly document category classification, and the random forest algorithm obtained the best results for influential scholarly document prediction, with the help of a domain-independent knowledge graph, specifically DBpedia, to enhance the document representation method for predicting influential scholarly documents with categorical scholarly content. In this case, our study combines state-of-the-art machine learning methods with the BOW document representation method. We also enhance the BOW document representation with the direct type (RDF type) and unqualified relation from DBpedia. From this experiment, we did not find any impact of the enhanced document representation for the scholarly document category classification. We found an effect in the influential scholarly document prediction with categorical data.",

keywords = "COVID-19, Domain-independent knowledge graph, Influential scholarly document prediction, Machine learning algorithms, Text mining, World health organization",

author = "Gollam Rabby and Jennifer D{\textquoteright}Souza and Allard Oelen and Lucie Dvorackova and Vojt{\v e}ch Sv{\'a}tek and S{\"o}ren Auer",

note = "Funding Information: Open Access funding enabled and organized by Projekt DEAL. Gollam Rabby was partly supported by grant IGA 16/2022 “PRECOG: Predicting REsearch COncepts of siGnificance” and CIMPLE project (CHIST-ERA-19-XAI-003). S{\"o}ren Auer, Jennifer D{\textquoteright}Souza, and Allard Oelen were partially supported by the European Research Council for the project ScienceGRAPH (Grant agreement ID: 819536) and the GWK/DFG grant for NFDI4DataScience (460234259). ",

year = "2023",

month = nov,

day = "28",

doi = "10.1186/s13326-023-00298-4",

language = "English",

volume = "14",

journal = "Journal of biomedical semantics",

issn = "2041-1480",

publisher = "BioMed Central Ltd.",

}

Download

TY - JOUR

T1 - Impact of COVID-19 research

T2 - a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph

AU - Rabby, Gollam

AU - D’Souza, Jennifer

AU - Oelen, Allard

AU - Dvorackova, Lucie

AU - Svátek, Vojtěch

AU - Auer, Sören

N1 - Funding Information: Open Access funding enabled and organized by Projekt DEAL. Gollam Rabby was partly supported by grant IGA 16/2022 “PRECOG: Predicting REsearch COncepts of siGnificance” and CIMPLE project (CHIST-ERA-19-XAI-003). Sören Auer, Jennifer D’Souza, and Allard Oelen were partially supported by the European Research Council for the project ScienceGRAPH (Grant agreement ID: 819536) and the GWK/DFG grant for NFDI4DataScience (460234259).

PY - 2023/11/28

Y1 - 2023/11/28

N2 - Multiple studies have investigated bibliometric features and uncategorized scholarly documents for the influential scholarly document prediction task. In this paper, we describe our work that attempts to go beyond bibliometric metadata to predict influential scholarly documents. Furthermore, this work also examines the influential scholarly document prediction task over categorized scholarly documents. We also introduce a new approach to enhance the document representation method with a domain-independent knowledge graph to find the influential scholarly document using categorized scholarly content. As the input collection, we use the WHO corpus with scholarly documents on the theme of COVID-19. This study examines different document representation methods for machine learning, including TF-IDF, BOW, and embedding-based language models (BERT). The TF-IDF document representation method works better than others. From various machine learning methods tested, logistic regression outperformed the other for scholarly document category classification, and the random forest algorithm obtained the best results for influential scholarly document prediction, with the help of a domain-independent knowledge graph, specifically DBpedia, to enhance the document representation method for predicting influential scholarly documents with categorical scholarly content. In this case, our study combines state-of-the-art machine learning methods with the BOW document representation method. We also enhance the BOW document representation with the direct type (RDF type) and unqualified relation from DBpedia. From this experiment, we did not find any impact of the enhanced document representation for the scholarly document category classification. We found an effect in the influential scholarly document prediction with categorical data.

AB - Multiple studies have investigated bibliometric features and uncategorized scholarly documents for the influential scholarly document prediction task. In this paper, we describe our work that attempts to go beyond bibliometric metadata to predict influential scholarly documents. Furthermore, this work also examines the influential scholarly document prediction task over categorized scholarly documents. We also introduce a new approach to enhance the document representation method with a domain-independent knowledge graph to find the influential scholarly document using categorized scholarly content. As the input collection, we use the WHO corpus with scholarly documents on the theme of COVID-19. This study examines different document representation methods for machine learning, including TF-IDF, BOW, and embedding-based language models (BERT). The TF-IDF document representation method works better than others. From various machine learning methods tested, logistic regression outperformed the other for scholarly document category classification, and the random forest algorithm obtained the best results for influential scholarly document prediction, with the help of a domain-independent knowledge graph, specifically DBpedia, to enhance the document representation method for predicting influential scholarly documents with categorical scholarly content. In this case, our study combines state-of-the-art machine learning methods with the BOW document representation method. We also enhance the BOW document representation with the direct type (RDF type) and unqualified relation from DBpedia. From this experiment, we did not find any impact of the enhanced document representation for the scholarly document category classification. We found an effect in the influential scholarly document prediction with categorical data.

KW - COVID-19

KW - Domain-independent knowledge graph

KW - Influential scholarly document prediction

KW - Machine learning algorithms

KW - Text mining

KW - World health organization

UR - http://www.scopus.com/inward/record.url?scp=85177870094&partnerID=8YFLogxK

U2 - 10.1186/s13326-023-00298-4

DO - 10.1186/s13326-023-00298-4

M3 - Article

C2 - 38017587

AN - SCOPUS:85177870094

VL - 14

JO - Journal of biomedical semantics

JF - Journal of biomedical semantics

SN - 2041-1480

M1 - 18

ER -

Research@Leibniz University

Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph

Autoren

Organisationseinheiten

Externe Organisationen

Details

Abstract

ASJC Scopus Sachgebiete

Zitieren

Von denselben Autoren

CLEF 2024 SimpleText Track

ORKG-Leaderboards: a systematic workflow for mining leaderboards as a knowledge graph

Increasing Reproducibility in Science by Interlinking Semantic Artifact Descriptions in a Knowledge Graph

Scholarly Knowledge Graph Construction from Published Software Packages

"FAIR-by-Design" Artifacts: Enriching Publications and Software with FAIR Scientific Information at the Time of Creation