LLMs4SchemaDiscovery: A Human-in-the-Loop Workflow for Scientific Schema Mining with Large Language Models

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

  • Sameer Sadruddin
  • Jennifer D’Souza
  • Eleni Poupaki
  • Alex Watkins
  • Hamed Babaei Giglou
  • Anisa Rula
  • Bora Karasulu
  • Sören Auer
  • Adrie Mackus
  • Erwin Kessels

Research Organisations

External Research Organisations

  • German National Library of Science and Technology (TIB)
  • Eindhoven University of Technology (TU/e)
  • University of Warwick
  • University of Brescia
View graph of relations

Details

Original languageEnglish
Title of host publicationThe Semantic Web
Subtitle of host publication22nd European Semantic Web Conference, ESWC 2025, Proceedings
EditorsEdward Curry, Maribel Acosta, Maria Poveda-Villalón, Marieke van Erp, Adegboyega Ojo, Katja Hose, Cogan Shimizu, Pasquale Lisena
PublisherSpringer Science and Business Media Deutschland GmbH
Pages244-261
Number of pages18
ISBN (electronic)978-3-031-94578-6
ISBN (print)9783031945779
Publication statusPublished - 31 May 2025
Event22nd European Semantic Web Conference, ESWC 2025 - Portoroz, Slovenia
Duration: 1 Jun 20255 Jun 2025

Publication series

NameLecture Notes in Computer Science
Volume15719 LNCS
ISSN (Print)0302-9743
ISSN (electronic)1611-3349

Abstract

Extracting structured information from unstructured text is crucial for modeling real-world processes, but traditional schema mining relies on semi-structured data, limiting scalability. This paper introduces schema-miner, a novel tool that combines large language models with human feedback to automate and refine schema extraction. Through an iterative workflow, it organizes properties from text, incorporates expert input, and integrates domain-specific ontologies for semantic depth. Applied to materials science—specifically atomic layer deposition—schema-miner demonstrates that expert-guided LLMs generate semantically rich schemas suitable for diverse real-world applications.

Keywords

    Human-in-the-loop Workflow, Large Language Models, Schema Discovery, Schema Mining, Scientific Schemas

ASJC Scopus subject areas

Cite this

LLMs4SchemaDiscovery: A Human-in-the-Loop Workflow for Scientific Schema Mining with Large Language Models. / Sadruddin, Sameer; D’Souza, Jennifer; Poupaki, Eleni et al.
The Semantic Web : 22nd European Semantic Web Conference, ESWC 2025, Proceedings. ed. / Edward Curry; Maribel Acosta; Maria Poveda-Villalón; Marieke van Erp; Adegboyega Ojo; Katja Hose; Cogan Shimizu; Pasquale Lisena. Springer Science and Business Media Deutschland GmbH, 2025. p. 244-261 (Lecture Notes in Computer Science; Vol. 15719 LNCS).

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Sadruddin, S, D’Souza, J, Poupaki, E, Watkins, A, Babaei Giglou, H, Rula, A, Karasulu, B, Auer, S, Mackus, A & Kessels, E 2025, LLMs4SchemaDiscovery: A Human-in-the-Loop Workflow for Scientific Schema Mining with Large Language Models. in E Curry, M Acosta, M Poveda-Villalón, M van Erp, A Ojo, K Hose, C Shimizu & P Lisena (eds), The Semantic Web : 22nd European Semantic Web Conference, ESWC 2025, Proceedings. Lecture Notes in Computer Science, vol. 15719 LNCS, Springer Science and Business Media Deutschland GmbH, pp. 244-261, 22nd European Semantic Web Conference, ESWC 2025, Portoroz, Slovenia, 1 Jun 2025. https://doi.org/10.1007/978-3-031-94578-6_14, https://doi.org/10.48550/arXiv.2504.00752
Sadruddin, S., D’Souza, J., Poupaki, E., Watkins, A., Babaei Giglou, H., Rula, A., Karasulu, B., Auer, S., Mackus, A., & Kessels, E. (2025). LLMs4SchemaDiscovery: A Human-in-the-Loop Workflow for Scientific Schema Mining with Large Language Models. In E. Curry, M. Acosta, M. Poveda-Villalón, M. van Erp, A. Ojo, K. Hose, C. Shimizu, & P. Lisena (Eds.), The Semantic Web : 22nd European Semantic Web Conference, ESWC 2025, Proceedings (pp. 244-261). (Lecture Notes in Computer Science; Vol. 15719 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-94578-6_14, https://doi.org/10.48550/arXiv.2504.00752
Sadruddin S, D’Souza J, Poupaki E, Watkins A, Babaei Giglou H, Rula A et al. LLMs4SchemaDiscovery: A Human-in-the-Loop Workflow for Scientific Schema Mining with Large Language Models. In Curry E, Acosta M, Poveda-Villalón M, van Erp M, Ojo A, Hose K, Shimizu C, Lisena P, editors, The Semantic Web : 22nd European Semantic Web Conference, ESWC 2025, Proceedings. Springer Science and Business Media Deutschland GmbH. 2025. p. 244-261. (Lecture Notes in Computer Science). doi: 10.1007/978-3-031-94578-6_14, 10.48550/arXiv.2504.00752
Sadruddin, Sameer ; D’Souza, Jennifer ; Poupaki, Eleni et al. / LLMs4SchemaDiscovery : A Human-in-the-Loop Workflow for Scientific Schema Mining with Large Language Models. The Semantic Web : 22nd European Semantic Web Conference, ESWC 2025, Proceedings. editor / Edward Curry ; Maribel Acosta ; Maria Poveda-Villalón ; Marieke van Erp ; Adegboyega Ojo ; Katja Hose ; Cogan Shimizu ; Pasquale Lisena. Springer Science and Business Media Deutschland GmbH, 2025. pp. 244-261 (Lecture Notes in Computer Science).
Download
@inproceedings{d74c94dc5ffd4eaf9183d210ab54ea68,
title = "LLMs4SchemaDiscovery: A Human-in-the-Loop Workflow for Scientific Schema Mining with Large Language Models",
abstract = "Extracting structured information from unstructured text is crucial for modeling real-world processes, but traditional schema mining relies on semi-structured data, limiting scalability. This paper introduces schema-miner, a novel tool that combines large language models with human feedback to automate and refine schema extraction. Through an iterative workflow, it organizes properties from text, incorporates expert input, and integrates domain-specific ontologies for semantic depth. Applied to materials science—specifically atomic layer deposition—schema-miner demonstrates that expert-guided LLMs generate semantically rich schemas suitable for diverse real-world applications.",
keywords = "Human-in-the-loop Workflow, Large Language Models, Schema Discovery, Schema Mining, Scientific Schemas",
author = "Sameer Sadruddin and Jennifer D{\textquoteright}Souza and Eleni Poupaki and Alex Watkins and {Babaei Giglou}, Hamed and Anisa Rula and Bora Karasulu and S{\"o}ren Auer and Adrie Mackus and Erwin Kessels",
note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.; 22nd European Semantic Web Conference, ESWC 2025, ESWC 2025 ; Conference date: 01-06-2025 Through 05-06-2025",
year = "2025",
month = may,
day = "31",
doi = "10.1007/978-3-031-94578-6_14",
language = "English",
isbn = "9783031945779",
series = "Lecture Notes in Computer Science",
publisher = "Springer Science and Business Media Deutschland GmbH",
pages = "244--261",
editor = "Edward Curry and Maribel Acosta and Maria Poveda-Villal{\'o}n and {van Erp}, Marieke and Adegboyega Ojo and Katja Hose and Cogan Shimizu and Pasquale Lisena",
booktitle = "The Semantic Web",
address = "Germany",

}

Download

TY - GEN

T1 - LLMs4SchemaDiscovery

T2 - 22nd European Semantic Web Conference, ESWC 2025

AU - Sadruddin, Sameer

AU - D’Souza, Jennifer

AU - Poupaki, Eleni

AU - Watkins, Alex

AU - Babaei Giglou, Hamed

AU - Rula, Anisa

AU - Karasulu, Bora

AU - Auer, Sören

AU - Mackus, Adrie

AU - Kessels, Erwin

N1 - Publisher Copyright: © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

PY - 2025/5/31

Y1 - 2025/5/31

N2 - Extracting structured information from unstructured text is crucial for modeling real-world processes, but traditional schema mining relies on semi-structured data, limiting scalability. This paper introduces schema-miner, a novel tool that combines large language models with human feedback to automate and refine schema extraction. Through an iterative workflow, it organizes properties from text, incorporates expert input, and integrates domain-specific ontologies for semantic depth. Applied to materials science—specifically atomic layer deposition—schema-miner demonstrates that expert-guided LLMs generate semantically rich schemas suitable for diverse real-world applications.

AB - Extracting structured information from unstructured text is crucial for modeling real-world processes, but traditional schema mining relies on semi-structured data, limiting scalability. This paper introduces schema-miner, a novel tool that combines large language models with human feedback to automate and refine schema extraction. Through an iterative workflow, it organizes properties from text, incorporates expert input, and integrates domain-specific ontologies for semantic depth. Applied to materials science—specifically atomic layer deposition—schema-miner demonstrates that expert-guided LLMs generate semantically rich schemas suitable for diverse real-world applications.

KW - Human-in-the-loop Workflow

KW - Large Language Models

KW - Schema Discovery

KW - Schema Mining

KW - Scientific Schemas

UR - http://www.scopus.com/inward/record.url?scp=105007760723&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-94578-6_14

DO - 10.1007/978-3-031-94578-6_14

M3 - Conference contribution

AN - SCOPUS:105007760723

SN - 9783031945779

T3 - Lecture Notes in Computer Science

SP - 244

EP - 261

BT - The Semantic Web

A2 - Curry, Edward

A2 - Acosta, Maribel

A2 - Poveda-Villalón, Maria

A2 - van Erp, Marieke

A2 - Ojo, Adegboyega

A2 - Hose, Katja

A2 - Shimizu, Cogan

A2 - Lisena, Pasquale

PB - Springer Science and Business Media Deutschland GmbH

Y2 - 1 June 2025 through 5 June 2025

ER -

By the same author(s)