Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Autoren

  • Gollam Rabby
  • Jennifer D’Souza
  • Allard Oelen
  • Lucie Dvorackova
  • Vojtěch Svátek
  • Sören Auer

Organisationseinheiten

Externe Organisationen

  • University of Economics, Prague
  • Technische Informationsbibliothek (TIB) Leibniz-Informationszentrum Technik und Naturwissenschaften und Universitätsbibliothek
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Aufsatznummer18
Seitenumfang19
FachzeitschriftJournal of biomedical semantics
Jahrgang14
PublikationsstatusVeröffentlicht - 28 Nov. 2023

Abstract

Multiple studies have investigated bibliometric features and uncategorized scholarly documents for the influential scholarly document prediction task. In this paper, we describe our work that attempts to go beyond bibliometric metadata to predict influential scholarly documents. Furthermore, this work also examines the influential scholarly document prediction task over categorized scholarly documents. We also introduce a new approach to enhance the document representation method with a domain-independent knowledge graph to find the influential scholarly document using categorized scholarly content. As the input collection, we use the WHO corpus with scholarly documents on the theme of COVID-19. This study examines different document representation methods for machine learning, including TF-IDF, BOW, and embedding-based language models (BERT). The TF-IDF document representation method works better than others. From various machine learning methods tested, logistic regression outperformed the other for scholarly document category classification, and the random forest algorithm obtained the best results for influential scholarly document prediction, with the help of a domain-independent knowledge graph, specifically DBpedia, to enhance the document representation method for predicting influential scholarly documents with categorical scholarly content. In this case, our study combines state-of-the-art machine learning methods with the BOW document representation method. We also enhance the BOW document representation with the direct type (RDF type) and unqualified relation from DBpedia. From this experiment, we did not find any impact of the enhanced document representation for the scholarly document category classification. We found an effect in the influential scholarly document prediction with categorical data.

ASJC Scopus Sachgebiete

Zitieren

Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph. / Rabby, Gollam; D’Souza, Jennifer; Oelen, Allard et al.
in: Journal of biomedical semantics, Jahrgang 14, 18, 28.11.2023.

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Download
@article{709c4c7af5044cf7aae0e9ee58732b2e,
title = "Impact of COVID-19 research: a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph",
abstract = "Multiple studies have investigated bibliometric features and uncategorized scholarly documents for the influential scholarly document prediction task. In this paper, we describe our work that attempts to go beyond bibliometric metadata to predict influential scholarly documents. Furthermore, this work also examines the influential scholarly document prediction task over categorized scholarly documents. We also introduce a new approach to enhance the document representation method with a domain-independent knowledge graph to find the influential scholarly document using categorized scholarly content. As the input collection, we use the WHO corpus with scholarly documents on the theme of COVID-19. This study examines different document representation methods for machine learning, including TF-IDF, BOW, and embedding-based language models (BERT). The TF-IDF document representation method works better than others. From various machine learning methods tested, logistic regression outperformed the other for scholarly document category classification, and the random forest algorithm obtained the best results for influential scholarly document prediction, with the help of a domain-independent knowledge graph, specifically DBpedia, to enhance the document representation method for predicting influential scholarly documents with categorical scholarly content. In this case, our study combines state-of-the-art machine learning methods with the BOW document representation method. We also enhance the BOW document representation with the direct type (RDF type) and unqualified relation from DBpedia. From this experiment, we did not find any impact of the enhanced document representation for the scholarly document category classification. We found an effect in the influential scholarly document prediction with categorical data.",
keywords = "COVID-19, Domain-independent knowledge graph, Influential scholarly document prediction, Machine learning algorithms, Text mining, World health organization",
author = "Gollam Rabby and Jennifer D{\textquoteright}Souza and Allard Oelen and Lucie Dvorackova and Vojt{\v e}ch Sv{\'a}tek and S{\"o}ren Auer",
note = "Funding Information: Open Access funding enabled and organized by Projekt DEAL. Gollam Rabby was partly supported by grant IGA 16/2022 “PRECOG: Predicting REsearch COncepts of siGnificance” and CIMPLE project (CHIST-ERA-19-XAI-003). S{\"o}ren Auer, Jennifer D{\textquoteright}Souza, and Allard Oelen were partially supported by the European Research Council for the project ScienceGRAPH (Grant agreement ID: 819536) and the GWK/DFG grant for NFDI4DataScience (460234259). ",
year = "2023",
month = nov,
day = "28",
doi = "10.1186/s13326-023-00298-4",
language = "English",
volume = "14",
journal = "Journal of biomedical semantics",
issn = "2041-1480",
publisher = "BioMed Central Ltd.",

}

Download

TY - JOUR

T1 - Impact of COVID-19 research

T2 - a study on predicting influential scholarly documents using machine learning and a domain-independent knowledge graph

AU - Rabby, Gollam

AU - D’Souza, Jennifer

AU - Oelen, Allard

AU - Dvorackova, Lucie

AU - Svátek, Vojtěch

AU - Auer, Sören

N1 - Funding Information: Open Access funding enabled and organized by Projekt DEAL. Gollam Rabby was partly supported by grant IGA 16/2022 “PRECOG: Predicting REsearch COncepts of siGnificance” and CIMPLE project (CHIST-ERA-19-XAI-003). Sören Auer, Jennifer D’Souza, and Allard Oelen were partially supported by the European Research Council for the project ScienceGRAPH (Grant agreement ID: 819536) and the GWK/DFG grant for NFDI4DataScience (460234259).

PY - 2023/11/28

Y1 - 2023/11/28

N2 - Multiple studies have investigated bibliometric features and uncategorized scholarly documents for the influential scholarly document prediction task. In this paper, we describe our work that attempts to go beyond bibliometric metadata to predict influential scholarly documents. Furthermore, this work also examines the influential scholarly document prediction task over categorized scholarly documents. We also introduce a new approach to enhance the document representation method with a domain-independent knowledge graph to find the influential scholarly document using categorized scholarly content. As the input collection, we use the WHO corpus with scholarly documents on the theme of COVID-19. This study examines different document representation methods for machine learning, including TF-IDF, BOW, and embedding-based language models (BERT). The TF-IDF document representation method works better than others. From various machine learning methods tested, logistic regression outperformed the other for scholarly document category classification, and the random forest algorithm obtained the best results for influential scholarly document prediction, with the help of a domain-independent knowledge graph, specifically DBpedia, to enhance the document representation method for predicting influential scholarly documents with categorical scholarly content. In this case, our study combines state-of-the-art machine learning methods with the BOW document representation method. We also enhance the BOW document representation with the direct type (RDF type) and unqualified relation from DBpedia. From this experiment, we did not find any impact of the enhanced document representation for the scholarly document category classification. We found an effect in the influential scholarly document prediction with categorical data.

AB - Multiple studies have investigated bibliometric features and uncategorized scholarly documents for the influential scholarly document prediction task. In this paper, we describe our work that attempts to go beyond bibliometric metadata to predict influential scholarly documents. Furthermore, this work also examines the influential scholarly document prediction task over categorized scholarly documents. We also introduce a new approach to enhance the document representation method with a domain-independent knowledge graph to find the influential scholarly document using categorized scholarly content. As the input collection, we use the WHO corpus with scholarly documents on the theme of COVID-19. This study examines different document representation methods for machine learning, including TF-IDF, BOW, and embedding-based language models (BERT). The TF-IDF document representation method works better than others. From various machine learning methods tested, logistic regression outperformed the other for scholarly document category classification, and the random forest algorithm obtained the best results for influential scholarly document prediction, with the help of a domain-independent knowledge graph, specifically DBpedia, to enhance the document representation method for predicting influential scholarly documents with categorical scholarly content. In this case, our study combines state-of-the-art machine learning methods with the BOW document representation method. We also enhance the BOW document representation with the direct type (RDF type) and unqualified relation from DBpedia. From this experiment, we did not find any impact of the enhanced document representation for the scholarly document category classification. We found an effect in the influential scholarly document prediction with categorical data.

KW - COVID-19

KW - Domain-independent knowledge graph

KW - Influential scholarly document prediction

KW - Machine learning algorithms

KW - Text mining

KW - World health organization

UR - http://www.scopus.com/inward/record.url?scp=85177870094&partnerID=8YFLogxK

U2 - 10.1186/s13326-023-00298-4

DO - 10.1186/s13326-023-00298-4

M3 - Article

C2 - 38017587

AN - SCOPUS:85177870094

VL - 14

JO - Journal of biomedical semantics

JF - Journal of biomedical semantics

SN - 2041-1480

M1 - 18

ER -

Von denselben Autoren