Tracing the Impact of Bias in Link Prediction

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Autoren

  • Mayra Russo
  • Sammy Fabian Sawischa
  • Maria Esther Vidal

Externe Organisationen

  • Technische Informationsbibliothek (TIB) Leibniz-Informationszentrum Technik und Naturwissenschaften und Universitätsbibliothek
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Titel des SammelwerksSAC '24
UntertitelProceedings of the 39th ACM/SIGAPP Symposium on Applied Computing
Seiten1626-1633
Seitenumfang8
ISBN (elektronisch)9798400702433
PublikationsstatusVeröffentlicht - 21 Mai 2024
Veranstaltung39th Annual ACM Symposium on Applied Computing, SAC 2024 - Avila, Spanien
Dauer: 8 Apr. 202412 Apr. 2024

Abstract

Link prediction (LP) in knowledge graphs (KGs) uses embedding-based approaches and machine learning (ML) models to uncover new facts. In order to not overstate and accurately assess the performance of these techniques, their comprehensive and rigorous evaluation is needed. In this work, we suggest a framework to systematically trace and analyze bias - -specifically, test leakage bias and sample selection bias - -in training and testing knowledge graphs (KGs). The goal is to evaluate how bias affects the performance of LP models We specify a collection of bias measures in SPARQL (the W3C standard query language) to facilitate the analysis of any RDF graph with regard to its structural bias properties. Further, we evaluate our framework over seven state-of-the-art LP datasets (e.g., FB15k-237, WN18RR, and YAGO3-10) and the TransE model. Our findings show how bias, i.e., overrepresentation of entities and relations and pronounced information redundancy, is present across all datasets and how it advantageously impacts the reported performance of the LP model. With these results, we make a call for thorough assessments of data sources in order to discourage the use of biased datasets when appropriate, and to also help improve our understanding of how LP models work and to better interpret their produced output.

ASJC Scopus Sachgebiete

Zitieren

Tracing the Impact of Bias in Link Prediction. / Russo, Mayra; Sawischa, Sammy Fabian; Vidal, Maria Esther.
SAC '24: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing. 2024. S. 1626-1633.

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Russo, M, Sawischa, SF & Vidal, ME 2024, Tracing the Impact of Bias in Link Prediction. in SAC '24: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing. S. 1626-1633, 39th Annual ACM Symposium on Applied Computing, SAC 2024, Avila, Spanien, 8 Apr. 2024. https://doi.org/10.1145/3605098.3635912
Russo, M., Sawischa, S. F., & Vidal, M. E. (2024). Tracing the Impact of Bias in Link Prediction. In SAC '24: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing (S. 1626-1633) https://doi.org/10.1145/3605098.3635912
Russo M, Sawischa SF, Vidal ME. Tracing the Impact of Bias in Link Prediction. in SAC '24: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing. 2024. S. 1626-1633 doi: 10.1145/3605098.3635912
Russo, Mayra ; Sawischa, Sammy Fabian ; Vidal, Maria Esther. / Tracing the Impact of Bias in Link Prediction. SAC '24: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing. 2024. S. 1626-1633
Download
@inproceedings{42e21807d31a4b97baca10cac52db2bf,
title = "Tracing the Impact of Bias in Link Prediction",
abstract = "Link prediction (LP) in knowledge graphs (KGs) uses embedding-based approaches and machine learning (ML) models to uncover new facts. In order to not overstate and accurately assess the performance of these techniques, their comprehensive and rigorous evaluation is needed. In this work, we suggest a framework to systematically trace and analyze bias - -specifically, test leakage bias and sample selection bias - -in training and testing knowledge graphs (KGs). The goal is to evaluate how bias affects the performance of LP models We specify a collection of bias measures in SPARQL (the W3C standard query language) to facilitate the analysis of any RDF graph with regard to its structural bias properties. Further, we evaluate our framework over seven state-of-the-art LP datasets (e.g., FB15k-237, WN18RR, and YAGO3-10) and the TransE model. Our findings show how bias, i.e., overrepresentation of entities and relations and pronounced information redundancy, is present across all datasets and how it advantageously impacts the reported performance of the LP model. With these results, we make a call for thorough assessments of data sources in order to discourage the use of biased datasets when appropriate, and to also help improve our understanding of how LP models work and to better interpret their produced output.",
keywords = "bias, knowledge graphs, link prediction",
author = "Mayra Russo and Sawischa, {Sammy Fabian} and Vidal, {Maria Esther}",
note = "Publisher Copyright: {\textcopyright} 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.; 39th Annual ACM Symposium on Applied Computing, SAC 2024 ; Conference date: 08-04-2024 Through 12-04-2024",
year = "2024",
month = may,
day = "21",
doi = "10.1145/3605098.3635912",
language = "English",
pages = "1626--1633",
booktitle = "SAC '24",

}

Download

TY - GEN

T1 - Tracing the Impact of Bias in Link Prediction

AU - Russo, Mayra

AU - Sawischa, Sammy Fabian

AU - Vidal, Maria Esther

N1 - Publisher Copyright: © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

PY - 2024/5/21

Y1 - 2024/5/21

N2 - Link prediction (LP) in knowledge graphs (KGs) uses embedding-based approaches and machine learning (ML) models to uncover new facts. In order to not overstate and accurately assess the performance of these techniques, their comprehensive and rigorous evaluation is needed. In this work, we suggest a framework to systematically trace and analyze bias - -specifically, test leakage bias and sample selection bias - -in training and testing knowledge graphs (KGs). The goal is to evaluate how bias affects the performance of LP models We specify a collection of bias measures in SPARQL (the W3C standard query language) to facilitate the analysis of any RDF graph with regard to its structural bias properties. Further, we evaluate our framework over seven state-of-the-art LP datasets (e.g., FB15k-237, WN18RR, and YAGO3-10) and the TransE model. Our findings show how bias, i.e., overrepresentation of entities and relations and pronounced information redundancy, is present across all datasets and how it advantageously impacts the reported performance of the LP model. With these results, we make a call for thorough assessments of data sources in order to discourage the use of biased datasets when appropriate, and to also help improve our understanding of how LP models work and to better interpret their produced output.

AB - Link prediction (LP) in knowledge graphs (KGs) uses embedding-based approaches and machine learning (ML) models to uncover new facts. In order to not overstate and accurately assess the performance of these techniques, their comprehensive and rigorous evaluation is needed. In this work, we suggest a framework to systematically trace and analyze bias - -specifically, test leakage bias and sample selection bias - -in training and testing knowledge graphs (KGs). The goal is to evaluate how bias affects the performance of LP models We specify a collection of bias measures in SPARQL (the W3C standard query language) to facilitate the analysis of any RDF graph with regard to its structural bias properties. Further, we evaluate our framework over seven state-of-the-art LP datasets (e.g., FB15k-237, WN18RR, and YAGO3-10) and the TransE model. Our findings show how bias, i.e., overrepresentation of entities and relations and pronounced information redundancy, is present across all datasets and how it advantageously impacts the reported performance of the LP model. With these results, we make a call for thorough assessments of data sources in order to discourage the use of biased datasets when appropriate, and to also help improve our understanding of how LP models work and to better interpret their produced output.

KW - bias

KW - knowledge graphs

KW - link prediction

UR - http://www.scopus.com/inward/record.url?scp=85197662345&partnerID=8YFLogxK

U2 - 10.1145/3605098.3635912

DO - 10.1145/3605098.3635912

M3 - Conference contribution

AN - SCOPUS:85197662345

SP - 1626

EP - 1633

BT - SAC '24

T2 - 39th Annual ACM Symposium on Applied Computing, SAC 2024

Y2 - 8 April 2024 through 12 April 2024

ER -