Loading [MathJax]/extensions/tex2jax.js

Retrieval-Augmented Generation of Event Collections from Web Archives and the Live Web

Research output: Contribution to journalArticleResearchpeer review

Authors

Research Organisations

Details

Original languageEnglish
Article number12
JournalInternational Journal on Digital Libraries
Volume26
Issue number2
Publication statusPublished - 23 Jun 2025

Abstract

Creating collections of societally impactful events is a challenging task given the sheer amount of information about such events covering a large variety of aspects and perspectives in web archives and the live web. The automatic creation of such collections from web archives typically does not live up to the high standards of web archivists, who put lots of manual effort into carefully curating collections. Furthermore, the lack of engaging presentation methods sets up a burden for any users aiming to interact effectively with event collections in order to explore an event in its entirety. Therefore, we (i) conduct expert interviews to determine the requirements for building and utilising event collections from the perspectives of web archivists, (ii) introduce EventExplorer – a retrieval-augmented generation (RAG) approach to create event collections through efficient retrieval and diversified ranking – and make it available in an interactive web system, (iii) apply EventExplorer on different sources including a web archive and the live web, (iv) discuss which requirements are met by EventExplorer as well as the challenges that remain for future work, with a specific emphasis on the distinctive characteristics of both archived web and the live web environments. We demonstrate the effectiveness of EventExplorer applied on web archives through a user study of our interactive system. Then, we transfer our lessons learned to the live web by creating event collections of 166 elections in Europe. Our evaluation results show the effectiveness of EventExplorer in addressing the requirements identified in our expert interviews. Further, we derive a set of challenges and potential future steps for bringing together the automatic creation of web archive collections and manual curation. Finally, we discuss how to make web archives ready for their use in RAG systems.

Keywords

    Event collections, Knowledge graph, Retrieval-augmented generation, Web archives

ASJC Scopus subject areas

Cite this

Retrieval-Augmented Generation of Event Collections from Web Archives and the Live Web. / Abdollahi, Sara; Nejdl, Wolfgang; Gottschalk, Simon.
In: International Journal on Digital Libraries, Vol. 26, No. 2, 12, 23.06.2025.

Research output: Contribution to journalArticleResearchpeer review

Abdollahi S, Nejdl W, Gottschalk S. Retrieval-Augmented Generation of Event Collections from Web Archives and the Live Web. International Journal on Digital Libraries. 2025 Jun 23;26(2):12. doi: 10.1007/s00799-025-00419-7
Abdollahi, Sara ; Nejdl, Wolfgang ; Gottschalk, Simon. / Retrieval-Augmented Generation of Event Collections from Web Archives and the Live Web. In: International Journal on Digital Libraries. 2025 ; Vol. 26, No. 2.
Download
@article{7da7711f3848426a8aa830f3fdaa0a4d,
title = "Retrieval-Augmented Generation of Event Collections from Web Archives and the Live Web",
abstract = "Creating collections of societally impactful events is a challenging task given the sheer amount of information about such events covering a large variety of aspects and perspectives in web archives and the live web. The automatic creation of such collections from web archives typically does not live up to the high standards of web archivists, who put lots of manual effort into carefully curating collections. Furthermore, the lack of engaging presentation methods sets up a burden for any users aiming to interact effectively with event collections in order to explore an event in its entirety. Therefore, we (i) conduct expert interviews to determine the requirements for building and utilising event collections from the perspectives of web archivists, (ii) introduce EventExplorer – a retrieval-augmented generation (RAG) approach to create event collections through efficient retrieval and diversified ranking – and make it available in an interactive web system, (iii) apply EventExplorer on different sources including a web archive and the live web, (iv) discuss which requirements are met by EventExplorer as well as the challenges that remain for future work, with a specific emphasis on the distinctive characteristics of both archived web and the live web environments. We demonstrate the effectiveness of EventExplorer applied on web archives through a user study of our interactive system. Then, we transfer our lessons learned to the live web by creating event collections of 166 elections in Europe. Our evaluation results show the effectiveness of EventExplorer in addressing the requirements identified in our expert interviews. Further, we derive a set of challenges and potential future steps for bringing together the automatic creation of web archive collections and manual curation. Finally, we discuss how to make web archives ready for their use in RAG systems.",
keywords = "Event collections, Knowledge graph, Retrieval-augmented generation, Web archives",
author = "Sara Abdollahi and Wolfgang Nejdl and Simon Gottschalk",
note = "Publisher Copyright: {\textcopyright} The Author(s) 2025.",
year = "2025",
month = jun,
day = "23",
doi = "10.1007/s00799-025-00419-7",
language = "English",
volume = "26",
number = "2",

}

Download

TY - JOUR

T1 - Retrieval-Augmented Generation of Event Collections from Web Archives and the Live Web

AU - Abdollahi, Sara

AU - Nejdl, Wolfgang

AU - Gottschalk, Simon

N1 - Publisher Copyright: © The Author(s) 2025.

PY - 2025/6/23

Y1 - 2025/6/23

N2 - Creating collections of societally impactful events is a challenging task given the sheer amount of information about such events covering a large variety of aspects and perspectives in web archives and the live web. The automatic creation of such collections from web archives typically does not live up to the high standards of web archivists, who put lots of manual effort into carefully curating collections. Furthermore, the lack of engaging presentation methods sets up a burden for any users aiming to interact effectively with event collections in order to explore an event in its entirety. Therefore, we (i) conduct expert interviews to determine the requirements for building and utilising event collections from the perspectives of web archivists, (ii) introduce EventExplorer – a retrieval-augmented generation (RAG) approach to create event collections through efficient retrieval and diversified ranking – and make it available in an interactive web system, (iii) apply EventExplorer on different sources including a web archive and the live web, (iv) discuss which requirements are met by EventExplorer as well as the challenges that remain for future work, with a specific emphasis on the distinctive characteristics of both archived web and the live web environments. We demonstrate the effectiveness of EventExplorer applied on web archives through a user study of our interactive system. Then, we transfer our lessons learned to the live web by creating event collections of 166 elections in Europe. Our evaluation results show the effectiveness of EventExplorer in addressing the requirements identified in our expert interviews. Further, we derive a set of challenges and potential future steps for bringing together the automatic creation of web archive collections and manual curation. Finally, we discuss how to make web archives ready for their use in RAG systems.

AB - Creating collections of societally impactful events is a challenging task given the sheer amount of information about such events covering a large variety of aspects and perspectives in web archives and the live web. The automatic creation of such collections from web archives typically does not live up to the high standards of web archivists, who put lots of manual effort into carefully curating collections. Furthermore, the lack of engaging presentation methods sets up a burden for any users aiming to interact effectively with event collections in order to explore an event in its entirety. Therefore, we (i) conduct expert interviews to determine the requirements for building and utilising event collections from the perspectives of web archivists, (ii) introduce EventExplorer – a retrieval-augmented generation (RAG) approach to create event collections through efficient retrieval and diversified ranking – and make it available in an interactive web system, (iii) apply EventExplorer on different sources including a web archive and the live web, (iv) discuss which requirements are met by EventExplorer as well as the challenges that remain for future work, with a specific emphasis on the distinctive characteristics of both archived web and the live web environments. We demonstrate the effectiveness of EventExplorer applied on web archives through a user study of our interactive system. Then, we transfer our lessons learned to the live web by creating event collections of 166 elections in Europe. Our evaluation results show the effectiveness of EventExplorer in addressing the requirements identified in our expert interviews. Further, we derive a set of challenges and potential future steps for bringing together the automatic creation of web archive collections and manual curation. Finally, we discuss how to make web archives ready for their use in RAG systems.

KW - Event collections

KW - Knowledge graph

KW - Retrieval-augmented generation

KW - Web archives

UR - http://www.scopus.com/inward/record.url?scp=105008970912&partnerID=8YFLogxK

U2 - 10.1007/s00799-025-00419-7

DO - 10.1007/s00799-025-00419-7

M3 - Article

AN - SCOPUS:105008970912

VL - 26

JO - International Journal on Digital Libraries

JF - International Journal on Digital Libraries

SN - 1432-5012

IS - 2

M1 - 12

ER -

By the same author(s)