Loading [MathJax]/extensions/tex2jax.js

Empowering the SDM-RDFizer tool for scaling up to complex knowledge graph creation pipelines

Research output: Contribution to journalArticleResearchpeer review

Authors

  • Maria Esther Vidal
  • Diego Collarana
  • David Chaves-Fraga
  • Enrique Antonio Iglesias Vidal

Research Organisations

External Research Organisations

  • German National Library of Science and Technology (TIB)
  • University of Bonn
  • Universidad de Santiago de Compostela

Details

Original languageEnglish
Article numberSW-243580
JournalSemantic web
Volume16
Issue number2
Publication statusPublished - Jan 2025

Abstract

The significant increase in data volume in recent years has prompted the adoption of knowledge graphs as valuable data structures for integrating diverse data and metadata. However, this surge in data availability has brought to light challenges related to standardization, interoperability, and data quality. Knowledge graph creation faces complexities from large data volumes, data heterogeneity, and high duplicate rates. This work addresses these challenges and proposes data management techniques to scale up the creation of knowledge graphs specified using the RDF Mapping Language (RML). These techniques are integrated into SDM-RDFizer, transforming it into a two-fold solution designed to address the complexities of generating knowledge graphs. Firstly, we introduce a reordering approach for RML triples maps, prioritizing the evaluation of the most selective maps first to reduce memory usage. Secondly, we employ an RDF compression strategy, along with optimized data structures and novel operators, to prevent the generation of duplicate RDF triples and optimize the execution of RML operators. We assess the performance of SDM-RDFizer through established benchmarks. The evaluation showcases the effectiveness of SDM-RDFizer compared to state-of-the-art RML engines, emphasizing the benefits of our techniques. Furthermore, the paper presents real-world projects where SDM-RDFizer has been utilized, providing insights into the advantages of declaratively defining knowledge graphs and efficiently executing these specifications using this engine.

Keywords

    Data integration systems, knowledge graphs, RDF mapping languages, DOI error

ASJC Scopus subject areas

Cite this

Empowering the SDM-RDFizer tool for scaling up to complex knowledge graph creation pipelines. / Vidal, Maria Esther; Collarana, Diego; Chaves-Fraga, David et al.
In: Semantic web, Vol. 16, No. 2, SW-243580, 01.2025.

Research output: Contribution to journalArticleResearchpeer review

Vidal, ME, Collarana, D, Chaves-Fraga, D & Iglesias Vidal, EA 2025, 'Empowering the SDM-RDFizer tool for scaling up to complex knowledge graph creation pipelines', Semantic web, vol. 16, no. 2, SW-243580. https://doi.org/10.3233/SW-243580
Vidal, M. E., Collarana, D., Chaves-Fraga, D., & Iglesias Vidal, E. A. (2025). Empowering the SDM-RDFizer tool for scaling up to complex knowledge graph creation pipelines. Semantic web, 16(2), Article SW-243580. https://doi.org/10.3233/SW-243580
Vidal ME, Collarana D, Chaves-Fraga D, Iglesias Vidal EA. Empowering the SDM-RDFizer tool for scaling up to complex knowledge graph creation pipelines. Semantic web. 2025 Jan;16(2):SW-243580. doi: 10.3233/SW-243580
Vidal, Maria Esther ; Collarana, Diego ; Chaves-Fraga, David et al. / Empowering the SDM-RDFizer tool for scaling up to complex knowledge graph creation pipelines. In: Semantic web. 2025 ; Vol. 16, No. 2.
Download
@article{4a837ee78625458d98a07be4cc339f8b,
title = "Empowering the SDM-RDFizer tool for scaling up to complex knowledge graph creation pipelines",
abstract = "The significant increase in data volume in recent years has prompted the adoption of knowledge graphs as valuable data structures for integrating diverse data and metadata. However, this surge in data availability has brought to light challenges related to standardization, interoperability, and data quality. Knowledge graph creation faces complexities from large data volumes, data heterogeneity, and high duplicate rates. This work addresses these challenges and proposes data management techniques to scale up the creation of knowledge graphs specified using the RDF Mapping Language (RML). These techniques are integrated into SDM-RDFizer, transforming it into a two-fold solution designed to address the complexities of generating knowledge graphs. Firstly, we introduce a reordering approach for RML triples maps, prioritizing the evaluation of the most selective maps first to reduce memory usage. Secondly, we employ an RDF compression strategy, along with optimized data structures and novel operators, to prevent the generation of duplicate RDF triples and optimize the execution of RML operators. We assess the performance of SDM-RDFizer through established benchmarks. The evaluation showcases the effectiveness of SDM-RDFizer compared to state-of-the-art RML engines, emphasizing the benefits of our techniques. Furthermore, the paper presents real-world projects where SDM-RDFizer has been utilized, providing insights into the advantages of declaratively defining knowledge graphs and efficiently executing these specifications using this engine.",
keywords = "Data integration systems, knowledge graphs, RDF mapping languages, DOI error",
author = "Vidal, {Maria Esther} and Diego Collarana and David Chaves-Fraga and {Iglesias Vidal}, {Enrique Antonio}",
note = "Publisher Copyright: {\textcopyright} {\textcopyright} 2024 – The authors. Published by IOS Press.",
year = "2025",
month = jan,
doi = "10.3233/SW-243580",
language = "English",
volume = "16",
journal = "Semantic web",
issn = "1570-0844",
publisher = "SAGE Publications Ltd",
number = "2",

}

Download

TY - JOUR

T1 - Empowering the SDM-RDFizer tool for scaling up to complex knowledge graph creation pipelines

AU - Vidal, Maria Esther

AU - Collarana, Diego

AU - Chaves-Fraga, David

AU - Iglesias Vidal, Enrique Antonio

N1 - Publisher Copyright: © © 2024 – The authors. Published by IOS Press.

PY - 2025/1

Y1 - 2025/1

N2 - The significant increase in data volume in recent years has prompted the adoption of knowledge graphs as valuable data structures for integrating diverse data and metadata. However, this surge in data availability has brought to light challenges related to standardization, interoperability, and data quality. Knowledge graph creation faces complexities from large data volumes, data heterogeneity, and high duplicate rates. This work addresses these challenges and proposes data management techniques to scale up the creation of knowledge graphs specified using the RDF Mapping Language (RML). These techniques are integrated into SDM-RDFizer, transforming it into a two-fold solution designed to address the complexities of generating knowledge graphs. Firstly, we introduce a reordering approach for RML triples maps, prioritizing the evaluation of the most selective maps first to reduce memory usage. Secondly, we employ an RDF compression strategy, along with optimized data structures and novel operators, to prevent the generation of duplicate RDF triples and optimize the execution of RML operators. We assess the performance of SDM-RDFizer through established benchmarks. The evaluation showcases the effectiveness of SDM-RDFizer compared to state-of-the-art RML engines, emphasizing the benefits of our techniques. Furthermore, the paper presents real-world projects where SDM-RDFizer has been utilized, providing insights into the advantages of declaratively defining knowledge graphs and efficiently executing these specifications using this engine.

AB - The significant increase in data volume in recent years has prompted the adoption of knowledge graphs as valuable data structures for integrating diverse data and metadata. However, this surge in data availability has brought to light challenges related to standardization, interoperability, and data quality. Knowledge graph creation faces complexities from large data volumes, data heterogeneity, and high duplicate rates. This work addresses these challenges and proposes data management techniques to scale up the creation of knowledge graphs specified using the RDF Mapping Language (RML). These techniques are integrated into SDM-RDFizer, transforming it into a two-fold solution designed to address the complexities of generating knowledge graphs. Firstly, we introduce a reordering approach for RML triples maps, prioritizing the evaluation of the most selective maps first to reduce memory usage. Secondly, we employ an RDF compression strategy, along with optimized data structures and novel operators, to prevent the generation of duplicate RDF triples and optimize the execution of RML operators. We assess the performance of SDM-RDFizer through established benchmarks. The evaluation showcases the effectiveness of SDM-RDFizer compared to state-of-the-art RML engines, emphasizing the benefits of our techniques. Furthermore, the paper presents real-world projects where SDM-RDFizer has been utilized, providing insights into the advantages of declaratively defining knowledge graphs and efficiently executing these specifications using this engine.

KW - Data integration systems

KW - knowledge graphs

KW - RDF mapping languages

KW - DOI error

U2 - 10.3233/SW-243580

DO - 10.3233/SW-243580

M3 - Article

AN - SCOPUS:105000688528

VL - 16

JO - Semantic web

JF - Semantic web

SN - 1570-0844

IS - 2

M1 - SW-243580

ER -