Scholarly knowledge reuse leveraging knowledge graphs

Muhammad Haris

doi:10.15488/18101

Details

Original language	English
Qualification	Doctor rerum naturalium
Awarding Institution	Leibniz University Hannover
Supervised by	Auer, S., Supervisor
Date of Award	29 Oct 2024
Place of Publication	Hannover
Publication status	Published - 5 Nov 2024

Abstract

The invention of the World Wide Web (WWW) has enabled the widespread publication of scholarly knowledge, primarily in the form of research articles. Despite improved access to these publications, scholarly communication remains largely document-based. This trend leads to inefficient access and marginal utilization of scholarly knowledge, underscoring the need for more efficient methodologies in the dissemination and retrieval of such knowledge. Automating the structured representation of scholarly knowledge presented in research articles is challenging due to the unstructured nature of the content. Consequently, traditional information retrieval systems have become inadequate for machine-based exploration and reuse of scholarly knowledge. It is also crucial to address data integration, and interoperability challenges, as they are significant factors that affect the reusability of scholarly knowledge. Leveraging the Open Research Knowledge Graph (ORKG)--- a scholarly infrastructure supporting the production, curation and reuse of FAIR (Findable, Accessible, Interoperable, and Reusable) scholarly knowledge--- we present different approaches for systematically extracting, enriching, and querying scholarly knowledge. First, we propose an approach for automatically extracting scholarly knowledge from published software packages by static analysis of their metadata and contents (scripts and data) and populating a scholarly knowledge graph with the extracted knowledge. Our approach is based on mining scientific software packages linked to article publications by extracting metadata and analyzing the Abstract Syntax Tree (AST) of the source code to obtain information about the used and produced data as well as operations performed on data. The resulting structured knowledge interrelates articles, software packages metadata, and computational techniques applied to input data utilized as materials in research work. Second, we propose an approach for representing, publishing, and using information, extracted from various data sources, about instruments and associated scholarly artefacts. Our approach extracts heterogeneous information about instruments from different data sources as well as retrieves the artefacts that have been produced by these instruments. The resulting structured knowledge serves as a foundation for exploring and gaining a deeper understanding of the use and role of instruments in research. Third, we propose the DOI-based persistent identification of ORKG artefacts (Papers and Comparisons). This enables ORKG data citability and discovery in global scholarly infrastructures (e.g., DataCite, OpenAIRE, ORCID). Fourth, we propose a generic approach for linking ORKG content to third-party semantic resources (e.g., taxonomies, thesauri, ontologies). Such linking increases the interoperability and facilitates the reuse of scholarly knowledge, primarily by removing ambiguity. Finally, we present a GraphQL-based federated query service for executing distributed queries on multiple scholarly infrastructures (specifically, ORKG, DataCite, OpenAIRE and Wikidata), thus enabling the integrated retrieval of scholarly content from these infrastructures. In summary, our proposed approaches for populating, enriching, and querying scholarly knowledge graphs amount to an important and impactful contribution towards FAIR scholarly knowledge in 21st century scholarly infrastructures.

Cite this

Scholarly knowledge reuse leveraging knowledge graphs. / Haris, Muhammad.
Hannover, 2024. 142 p.

Research output: Thesis › Doctoral thesis

Haris, M 2024, 'Scholarly knowledge reuse leveraging knowledge graphs', Doctor rerum naturalium, Leibniz University Hannover, Hannover. https://doi.org/10.15488/18101

Haris, M. (2024). Scholarly knowledge reuse leveraging knowledge graphs. [Doctoral thesis, Leibniz University Hannover]. https://doi.org/10.15488/18101

Haris M. Scholarly knowledge reuse leveraging knowledge graphs. Hannover, 2024. 142 p. doi: 10.15488/18101

Haris, Muhammad. / Scholarly knowledge reuse leveraging knowledge graphs. Hannover, 2024. 142 p.

Download

@phdthesis{968db5733a844ded97ee62843f89dc28,

title = "Scholarly knowledge reuse leveraging knowledge graphs",

abstract = "The invention of the World Wide Web (WWW) has enabled the widespread publication of scholarly knowledge, primarily in the form of research articles. Despite improved access to these publications, scholarly communication remains largely document-based. This trend leads to inefficient access and marginal utilization of scholarly knowledge, underscoring the need for more efficient methodologies in the dissemination and retrieval of such knowledge. Automating the structured representation of scholarly knowledge presented in research articles is challenging due to the unstructured nature of the content. Consequently, traditional information retrieval systems have become inadequate for machine-based exploration and reuse of scholarly knowledge. It is also crucial to address data integration, and interoperability challenges, as they are significant factors that affect the reusability of scholarly knowledge. Leveraging the Open Research Knowledge Graph (ORKG)--- a scholarly infrastructure supporting the production, curation and reuse of FAIR (Findable, Accessible, Interoperable, and Reusable) scholarly knowledge--- we present different approaches for systematically extracting, enriching, and querying scholarly knowledge. First, we propose an approach for automatically extracting scholarly knowledge from published software packages by static analysis of their metadata and contents (scripts and data) and populating a scholarly knowledge graph with the extracted knowledge. Our approach is based on mining scientific software packages linked to article publications by extracting metadata and analyzing the Abstract Syntax Tree (AST) of the source code to obtain information about the used and produced data as well as operations performed on data. The resulting structured knowledge interrelates articles, software packages metadata, and computational techniques applied to input data utilized as materials in research work. Second, we propose an approach for representing, publishing, and using information, extracted from various data sources, about instruments and associated scholarly artefacts. Our approach extracts heterogeneous information about instruments from different data sources as well as retrieves the artefacts that have been produced by these instruments. The resulting structured knowledge serves as a foundation for exploring and gaining a deeper understanding of the use and role of instruments in research. Third, we propose the DOI-based persistent identification of ORKG artefacts (Papers and Comparisons). This enables ORKG data citability and discovery in global scholarly infrastructures (e.g., DataCite, OpenAIRE, ORCID). Fourth, we propose a generic approach for linking ORKG content to third-party semantic resources (e.g., taxonomies, thesauri, ontologies). Such linking increases the interoperability and facilitates the reuse of scholarly knowledge, primarily by removing ambiguity. Finally, we present a GraphQL-based federated query service for executing distributed queries on multiple scholarly infrastructures (specifically, ORKG, DataCite, OpenAIRE and Wikidata), thus enabling the integrated retrieval of scholarly content from these infrastructures. In summary, our proposed approaches for populating, enriching, and querying scholarly knowledge graphs amount to an important and impactful contribution towards FAIR scholarly knowledge in 21st century scholarly infrastructures.",

author = "Muhammad Haris",

year = "2024",

month = nov,

day = "5",

doi = "10.15488/18101",

language = "English",

school = "Leibniz University Hannover",

}

Download

TY - BOOK

T1 - Scholarly knowledge reuse leveraging knowledge graphs

AU - Haris, Muhammad

PY - 2024/11/5

Y1 - 2024/11/5

N2 - The invention of the World Wide Web (WWW) has enabled the widespread publication of scholarly knowledge, primarily in the form of research articles. Despite improved access to these publications, scholarly communication remains largely document-based. This trend leads to inefficient access and marginal utilization of scholarly knowledge, underscoring the need for more efficient methodologies in the dissemination and retrieval of such knowledge. Automating the structured representation of scholarly knowledge presented in research articles is challenging due to the unstructured nature of the content. Consequently, traditional information retrieval systems have become inadequate for machine-based exploration and reuse of scholarly knowledge. It is also crucial to address data integration, and interoperability challenges, as they are significant factors that affect the reusability of scholarly knowledge. Leveraging the Open Research Knowledge Graph (ORKG)--- a scholarly infrastructure supporting the production, curation and reuse of FAIR (Findable, Accessible, Interoperable, and Reusable) scholarly knowledge--- we present different approaches for systematically extracting, enriching, and querying scholarly knowledge. First, we propose an approach for automatically extracting scholarly knowledge from published software packages by static analysis of their metadata and contents (scripts and data) and populating a scholarly knowledge graph with the extracted knowledge. Our approach is based on mining scientific software packages linked to article publications by extracting metadata and analyzing the Abstract Syntax Tree (AST) of the source code to obtain information about the used and produced data as well as operations performed on data. The resulting structured knowledge interrelates articles, software packages metadata, and computational techniques applied to input data utilized as materials in research work. Second, we propose an approach for representing, publishing, and using information, extracted from various data sources, about instruments and associated scholarly artefacts. Our approach extracts heterogeneous information about instruments from different data sources as well as retrieves the artefacts that have been produced by these instruments. The resulting structured knowledge serves as a foundation for exploring and gaining a deeper understanding of the use and role of instruments in research. Third, we propose the DOI-based persistent identification of ORKG artefacts (Papers and Comparisons). This enables ORKG data citability and discovery in global scholarly infrastructures (e.g., DataCite, OpenAIRE, ORCID). Fourth, we propose a generic approach for linking ORKG content to third-party semantic resources (e.g., taxonomies, thesauri, ontologies). Such linking increases the interoperability and facilitates the reuse of scholarly knowledge, primarily by removing ambiguity. Finally, we present a GraphQL-based federated query service for executing distributed queries on multiple scholarly infrastructures (specifically, ORKG, DataCite, OpenAIRE and Wikidata), thus enabling the integrated retrieval of scholarly content from these infrastructures. In summary, our proposed approaches for populating, enriching, and querying scholarly knowledge graphs amount to an important and impactful contribution towards FAIR scholarly knowledge in 21st century scholarly infrastructures.

AB - The invention of the World Wide Web (WWW) has enabled the widespread publication of scholarly knowledge, primarily in the form of research articles. Despite improved access to these publications, scholarly communication remains largely document-based. This trend leads to inefficient access and marginal utilization of scholarly knowledge, underscoring the need for more efficient methodologies in the dissemination and retrieval of such knowledge. Automating the structured representation of scholarly knowledge presented in research articles is challenging due to the unstructured nature of the content. Consequently, traditional information retrieval systems have become inadequate for machine-based exploration and reuse of scholarly knowledge. It is also crucial to address data integration, and interoperability challenges, as they are significant factors that affect the reusability of scholarly knowledge. Leveraging the Open Research Knowledge Graph (ORKG)--- a scholarly infrastructure supporting the production, curation and reuse of FAIR (Findable, Accessible, Interoperable, and Reusable) scholarly knowledge--- we present different approaches for systematically extracting, enriching, and querying scholarly knowledge. First, we propose an approach for automatically extracting scholarly knowledge from published software packages by static analysis of their metadata and contents (scripts and data) and populating a scholarly knowledge graph with the extracted knowledge. Our approach is based on mining scientific software packages linked to article publications by extracting metadata and analyzing the Abstract Syntax Tree (AST) of the source code to obtain information about the used and produced data as well as operations performed on data. The resulting structured knowledge interrelates articles, software packages metadata, and computational techniques applied to input data utilized as materials in research work. Second, we propose an approach for representing, publishing, and using information, extracted from various data sources, about instruments and associated scholarly artefacts. Our approach extracts heterogeneous information about instruments from different data sources as well as retrieves the artefacts that have been produced by these instruments. The resulting structured knowledge serves as a foundation for exploring and gaining a deeper understanding of the use and role of instruments in research. Third, we propose the DOI-based persistent identification of ORKG artefacts (Papers and Comparisons). This enables ORKG data citability and discovery in global scholarly infrastructures (e.g., DataCite, OpenAIRE, ORCID). Fourth, we propose a generic approach for linking ORKG content to third-party semantic resources (e.g., taxonomies, thesauri, ontologies). Such linking increases the interoperability and facilitates the reuse of scholarly knowledge, primarily by removing ambiguity. Finally, we present a GraphQL-based federated query service for executing distributed queries on multiple scholarly infrastructures (specifically, ORKG, DataCite, OpenAIRE and Wikidata), thus enabling the integrated retrieval of scholarly content from these infrastructures. In summary, our proposed approaches for populating, enriching, and querying scholarly knowledge graphs amount to an important and impactful contribution towards FAIR scholarly knowledge in 21st century scholarly infrastructures.

U2 - 10.15488/18101

DO - 10.15488/18101

M3 - Doctoral thesis

CY - Hannover

ER -

Research@Leibniz University

Scholarly knowledge reuse leveraging knowledge graphs

Authors

Research Organisations

Details

Abstract

Cite this

By the same author(s)

Leveraging LLMs for Scientific Abstract Summarization: Unearthing the Essence of Research in a Single Sentence

WSDM 2025 General Chairs' Welcome

Leveraging GPT Models For Semantic Table Annotation

LLMs4SchemaDiscovery: A Human-in-the-Loop Workflow for Scientific Schema Mining with Large Language Models

OntoAligner: A Comprehensive Modular and Robust Python Toolkit for Ontology Alignment

Leveraging LLMs for Scientific Abstract Summarization: Unearthing the Essence of Research in a Single Sentence

WSDM 2025 General Chairs' Welcome

Leveraging GPT Models For Semantic Table Annotation

LLMs4SchemaDiscovery: A Human-in-the-Loop Workflow for Scientific Schema Mining with Large Language Models

OntoAligner: A Comprehensive Modular and Robust Python Toolkit for Ontology Alignment

Leveraging LLMs for Scientific Abstract Summarization: Unearthing the Essence of Research in a Single Sentence