Details
Original language | English |
---|---|
Qualification | Doctor rerum naturalium |
Awarding Institution | |
Supervised by |
|
Date of Award | 29 Oct 2024 |
Place of Publication | Hannover |
Publication status | Published - 5 Nov 2024 |
Abstract
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
Hannover, 2024. 142 p.
Research output: Thesis › Doctoral thesis
}
TY - BOOK
T1 - Scholarly knowledge reuse leveraging knowledge graphs
AU - Haris, Muhammad
PY - 2024/11/5
Y1 - 2024/11/5
N2 - The invention of the World Wide Web (WWW) has enabled the widespread publication of scholarly knowledge, primarily in the form of research articles. Despite improved access to these publications, scholarly communication remains largely document-based. This trend leads to inefficient access and marginal utilization of scholarly knowledge, underscoring the need for more efficient methodologies in the dissemination and retrieval of such knowledge. Automating the structured representation of scholarly knowledge presented in research articles is challenging due to the unstructured nature of the content. Consequently, traditional information retrieval systems have become inadequate for machine-based exploration and reuse of scholarly knowledge. It is also crucial to address data integration, and interoperability challenges, as they are significant factors that affect the reusability of scholarly knowledge. Leveraging the Open Research Knowledge Graph (ORKG)--- a scholarly infrastructure supporting the production, curation and reuse of FAIR (Findable, Accessible, Interoperable, and Reusable) scholarly knowledge--- we present different approaches for systematically extracting, enriching, and querying scholarly knowledge. First, we propose an approach for automatically extracting scholarly knowledge from published software packages by static analysis of their metadata and contents (scripts and data) and populating a scholarly knowledge graph with the extracted knowledge. Our approach is based on mining scientific software packages linked to article publications by extracting metadata and analyzing the Abstract Syntax Tree (AST) of the source code to obtain information about the used and produced data as well as operations performed on data. The resulting structured knowledge interrelates articles, software packages metadata, and computational techniques applied to input data utilized as materials in research work. Second, we propose an approach for representing, publishing, and using information, extracted from various data sources, about instruments and associated scholarly artefacts. Our approach extracts heterogeneous information about instruments from different data sources as well as retrieves the artefacts that have been produced by these instruments. The resulting structured knowledge serves as a foundation for exploring and gaining a deeper understanding of the use and role of instruments in research. Third, we propose the DOI-based persistent identification of ORKG artefacts (Papers and Comparisons). This enables ORKG data citability and discovery in global scholarly infrastructures (e.g., DataCite, OpenAIRE, ORCID). Fourth, we propose a generic approach for linking ORKG content to third-party semantic resources (e.g., taxonomies, thesauri, ontologies). Such linking increases the interoperability and facilitates the reuse of scholarly knowledge, primarily by removing ambiguity. Finally, we present a GraphQL-based federated query service for executing distributed queries on multiple scholarly infrastructures (specifically, ORKG, DataCite, OpenAIRE and Wikidata), thus enabling the integrated retrieval of scholarly content from these infrastructures. In summary, our proposed approaches for populating, enriching, and querying scholarly knowledge graphs amount to an important and impactful contribution towards FAIR scholarly knowledge in 21st century scholarly infrastructures.
AB - The invention of the World Wide Web (WWW) has enabled the widespread publication of scholarly knowledge, primarily in the form of research articles. Despite improved access to these publications, scholarly communication remains largely document-based. This trend leads to inefficient access and marginal utilization of scholarly knowledge, underscoring the need for more efficient methodologies in the dissemination and retrieval of such knowledge. Automating the structured representation of scholarly knowledge presented in research articles is challenging due to the unstructured nature of the content. Consequently, traditional information retrieval systems have become inadequate for machine-based exploration and reuse of scholarly knowledge. It is also crucial to address data integration, and interoperability challenges, as they are significant factors that affect the reusability of scholarly knowledge. Leveraging the Open Research Knowledge Graph (ORKG)--- a scholarly infrastructure supporting the production, curation and reuse of FAIR (Findable, Accessible, Interoperable, and Reusable) scholarly knowledge--- we present different approaches for systematically extracting, enriching, and querying scholarly knowledge. First, we propose an approach for automatically extracting scholarly knowledge from published software packages by static analysis of their metadata and contents (scripts and data) and populating a scholarly knowledge graph with the extracted knowledge. Our approach is based on mining scientific software packages linked to article publications by extracting metadata and analyzing the Abstract Syntax Tree (AST) of the source code to obtain information about the used and produced data as well as operations performed on data. The resulting structured knowledge interrelates articles, software packages metadata, and computational techniques applied to input data utilized as materials in research work. Second, we propose an approach for representing, publishing, and using information, extracted from various data sources, about instruments and associated scholarly artefacts. Our approach extracts heterogeneous information about instruments from different data sources as well as retrieves the artefacts that have been produced by these instruments. The resulting structured knowledge serves as a foundation for exploring and gaining a deeper understanding of the use and role of instruments in research. Third, we propose the DOI-based persistent identification of ORKG artefacts (Papers and Comparisons). This enables ORKG data citability and discovery in global scholarly infrastructures (e.g., DataCite, OpenAIRE, ORCID). Fourth, we propose a generic approach for linking ORKG content to third-party semantic resources (e.g., taxonomies, thesauri, ontologies). Such linking increases the interoperability and facilitates the reuse of scholarly knowledge, primarily by removing ambiguity. Finally, we present a GraphQL-based federated query service for executing distributed queries on multiple scholarly infrastructures (specifically, ORKG, DataCite, OpenAIRE and Wikidata), thus enabling the integrated retrieval of scholarly content from these infrastructures. In summary, our proposed approaches for populating, enriching, and querying scholarly knowledge graphs amount to an important and impactful contribution towards FAIR scholarly knowledge in 21st century scholarly infrastructures.
U2 - 10.15488/18101
DO - 10.15488/18101
M3 - Doctoral thesis
CY - Hannover
ER -