Information extraction pipelines for knowledge graphs

Mohamad Yaser Jaradeh; Kuldeep Singh; Markus Stocker; Andreas Both; Sören Auer

doi:10.1007/s10115-022-01826-x

Details

Original language	English
Pages (from-to)	1989-2016
Number of pages	28
Journal	Knowledge and information systems
Volume	65
Issue number	5
Early online date	7 Jan 2023
Publication status	Published - May 2023

Abstract

In the last decade, a large number of knowledge graph (KG) completion approaches were proposed. Albeit effective, these efforts are disjoint, and their collective strengths and weaknesses in effective KG completion have not been studied in the literature. We extend Plumber, a framework that brings together the research community’s disjoint efforts on KG completion. We include more components into the architecture of Plumber to comprise 40 reusable components for various KG completion subtasks, such as coreference resolution, entity linking, and relation extraction. Using these components, Plumber dynamically generates suitable knowledge extraction pipelines and offers overall 432 distinct pipelines. We study the optimization problem of choosing optimal pipelines based on input sentences. To do so, we train a transformer-based classification model that extracts contextual embeddings from the input and finds an appropriate pipeline. We study the efficacy of Plumber for extracting the KG triples using standard datasets over three KGs: DBpedia, Wikidata, and Open Research Knowledge Graph. Our results demonstrate the effectiveness of Plumber in dynamically generating KG completion pipelines, outperforming all baselines agnostic of the underlying KG. Furthermore, we provide an analysis of collective failure cases, study the similarities and synergies among integrated components and discuss their limitations.

Keywords

Information extraction, NLP pipelines, Semantic search, Semantic web, Software reusability

ASJC Scopus subject areas

Computer Science(all)
Software
Computer Science(all)
Information Systems
Computer Science(all)
Human-Computer Interaction
Computer Science(all)
Hardware and Architecture
Computer Science(all)
Artificial Intelligence

Cite this

Information extraction pipelines for knowledge graphs. / Jaradeh, Mohamad Yaser; Singh, Kuldeep; Stocker, Markus et al.
In: Knowledge and information systems, Vol. 65, No. 5, 05.2023, p. 1989-2016.

Research output: Contribution to journal › Article › Research › peer review

Jaradeh, MY, Singh, K, Stocker, M, Both, A & Auer, S 2023, 'Information extraction pipelines for knowledge graphs', Knowledge and information systems, vol. 65, no. 5, pp. 1989-2016. https://doi.org/10.1007/s10115-022-01826-x

Jaradeh, M. Y., Singh, K., Stocker, M., Both, A., & Auer, S. (2023). Information extraction pipelines for knowledge graphs. Knowledge and information systems, 65(5), 1989-2016. https://doi.org/10.1007/s10115-022-01826-x

Jaradeh MY, Singh K, Stocker M, Both A, Auer S. Information extraction pipelines for knowledge graphs. Knowledge and information systems. 2023 May;65(5):1989-2016. Epub 2023 Jan 7. doi: 10.1007/s10115-022-01826-x

Jaradeh, Mohamad Yaser ; Singh, Kuldeep ; Stocker, Markus et al. / Information extraction pipelines for knowledge graphs. In: Knowledge and information systems. 2023 ; Vol. 65, No. 5. pp. 1989-2016.

Download

@article{dc0cef77a5a2410dabd416e6eb7fb6bd,

title = "Information extraction pipelines for knowledge graphs",

abstract = "In the last decade, a large number of knowledge graph (KG) completion approaches were proposed. Albeit effective, these efforts are disjoint, and their collective strengths and weaknesses in effective KG completion have not been studied in the literature. We extend Plumber, a framework that brings together the research community{\textquoteright}s disjoint efforts on KG completion. We include more components into the architecture of Plumber to comprise 40 reusable components for various KG completion subtasks, such as coreference resolution, entity linking, and relation extraction. Using these components, Plumber dynamically generates suitable knowledge extraction pipelines and offers overall 432 distinct pipelines. We study the optimization problem of choosing optimal pipelines based on input sentences. To do so, we train a transformer-based classification model that extracts contextual embeddings from the input and finds an appropriate pipeline. We study the efficacy of Plumber for extracting the KG triples using standard datasets over three KGs: DBpedia, Wikidata, and Open Research Knowledge Graph. Our results demonstrate the effectiveness of Plumber in dynamically generating KG completion pipelines, outperforming all baselines agnostic of the underlying KG. Furthermore, we provide an analysis of collective failure cases, study the similarities and synergies among integrated components and discuss their limitations.",

keywords = "Information extraction, NLP pipelines, Semantic search, Semantic web, Software reusability",

author = "Jaradeh, {Mohamad Yaser} and Kuldeep Singh and Markus Stocker and Andreas Both and S{\"o}ren Auer",

note = "Funding Information: We thank anonymous reviewers for their very useful comments and suggestions. This work was co-funded by the European Research Council for the project ScienceGRAPH (Grant agreement ID: 819536) and the TIB Leibniz Information Centre for Science and Technology. We also thank Allard Oelen and Vitalis Wiens for their valuable feedback.",

year = "2023",

month = may,

doi = "10.1007/s10115-022-01826-x",

language = "English",

volume = "65",

pages = "1989--2016",

journal = "Knowledge and information systems",

issn = "0219-1377",

publisher = "Springer London",

number = "5",

}

Download

TY - JOUR

T1 - Information extraction pipelines for knowledge graphs

AU - Jaradeh, Mohamad Yaser

AU - Singh, Kuldeep

AU - Stocker, Markus

AU - Both, Andreas

AU - Auer, Sören

N1 - Funding Information: We thank anonymous reviewers for their very useful comments and suggestions. This work was co-funded by the European Research Council for the project ScienceGRAPH (Grant agreement ID: 819536) and the TIB Leibniz Information Centre for Science and Technology. We also thank Allard Oelen and Vitalis Wiens for their valuable feedback.

PY - 2023/5

Y1 - 2023/5

N2 - In the last decade, a large number of knowledge graph (KG) completion approaches were proposed. Albeit effective, these efforts are disjoint, and their collective strengths and weaknesses in effective KG completion have not been studied in the literature. We extend Plumber, a framework that brings together the research community’s disjoint efforts on KG completion. We include more components into the architecture of Plumber to comprise 40 reusable components for various KG completion subtasks, such as coreference resolution, entity linking, and relation extraction. Using these components, Plumber dynamically generates suitable knowledge extraction pipelines and offers overall 432 distinct pipelines. We study the optimization problem of choosing optimal pipelines based on input sentences. To do so, we train a transformer-based classification model that extracts contextual embeddings from the input and finds an appropriate pipeline. We study the efficacy of Plumber for extracting the KG triples using standard datasets over three KGs: DBpedia, Wikidata, and Open Research Knowledge Graph. Our results demonstrate the effectiveness of Plumber in dynamically generating KG completion pipelines, outperforming all baselines agnostic of the underlying KG. Furthermore, we provide an analysis of collective failure cases, study the similarities and synergies among integrated components and discuss their limitations.

AB - In the last decade, a large number of knowledge graph (KG) completion approaches were proposed. Albeit effective, these efforts are disjoint, and their collective strengths and weaknesses in effective KG completion have not been studied in the literature. We extend Plumber, a framework that brings together the research community’s disjoint efforts on KG completion. We include more components into the architecture of Plumber to comprise 40 reusable components for various KG completion subtasks, such as coreference resolution, entity linking, and relation extraction. Using these components, Plumber dynamically generates suitable knowledge extraction pipelines and offers overall 432 distinct pipelines. We study the optimization problem of choosing optimal pipelines based on input sentences. To do so, we train a transformer-based classification model that extracts contextual embeddings from the input and finds an appropriate pipeline. We study the efficacy of Plumber for extracting the KG triples using standard datasets over three KGs: DBpedia, Wikidata, and Open Research Knowledge Graph. Our results demonstrate the effectiveness of Plumber in dynamically generating KG completion pipelines, outperforming all baselines agnostic of the underlying KG. Furthermore, we provide an analysis of collective failure cases, study the similarities and synergies among integrated components and discuss their limitations.

KW - Information extraction

KW - NLP pipelines

KW - Semantic search

KW - Semantic web

KW - Software reusability

UR - http://www.scopus.com/inward/record.url?scp=85145703568&partnerID=8YFLogxK

U2 - 10.1007/s10115-022-01826-x

DO - 10.1007/s10115-022-01826-x

M3 - Article

AN - SCOPUS:85145703568

VL - 65

SP - 1989

EP - 2016

JO - Knowledge and information systems

JF - Knowledge and information systems

SN - 0219-1377

IS - 5

ER -

Research@Leibniz University

Information extraction pipelines for knowledge graphs

Authors

Research Organisations

External Research Organisations

Details

Abstract

Keywords

ASJC Scopus subject areas

Cite this

By the same author(s)

Leveraging GPT Models For Semantic Table Annotation

Managing Comprehensive Research Instrument Descriptions Within a Scholarly Knowledge Graph

DataDesc: A framework for creating and sharing technical metadata for research software interfaces

Leveraging LLMs for Scientific Abstract Summarization: Unearthing the Essence of Research in a Single Sentence

WSDM 2025 General Chairs' Welcome

Leveraging GPT Models For Semantic Table Annotation

Managing Comprehensive Research Instrument Descriptions Within a Scholarly Knowledge Graph

DataDesc: A framework for creating and sharing technical metadata for research software interfaces

Leveraging LLMs for Scientific Abstract Summarization: Unearthing the Essence of Research in a Single Sentence

WSDM 2025 General Chairs' Welcome

Leveraging GPT Models For Semantic Table Annotation