Details
Original language | English |
---|---|
Article number | 12 |
Journal | International Journal on Digital Libraries |
Volume | 26 |
Issue number | 2 |
Publication status | Published - 23 Jun 2025 |
Abstract
Creating collections of societally impactful events is a challenging task given the sheer amount of information about such events covering a large variety of aspects and perspectives in web archives and the live web. The automatic creation of such collections from web archives typically does not live up to the high standards of web archivists, who put lots of manual effort into carefully curating collections. Furthermore, the lack of engaging presentation methods sets up a burden for any users aiming to interact effectively with event collections in order to explore an event in its entirety. Therefore, we (i) conduct expert interviews to determine the requirements for building and utilising event collections from the perspectives of web archivists, (ii) introduce EventExplorer – a retrieval-augmented generation (RAG) approach to create event collections through efficient retrieval and diversified ranking – and make it available in an interactive web system, (iii) apply EventExplorer on different sources including a web archive and the live web, (iv) discuss which requirements are met by EventExplorer as well as the challenges that remain for future work, with a specific emphasis on the distinctive characteristics of both archived web and the live web environments. We demonstrate the effectiveness of EventExplorer applied on web archives through a user study of our interactive system. Then, we transfer our lessons learned to the live web by creating event collections of 166 elections in Europe. Our evaluation results show the effectiveness of EventExplorer in addressing the requirements identified in our expert interviews. Further, we derive a set of challenges and potential future steps for bringing together the automatic creation of web archive collections and manual curation. Finally, we discuss how to make web archives ready for their use in RAG systems.
Keywords
- Event collections, Knowledge graph, Retrieval-augmented generation, Web archives
ASJC Scopus subject areas
- Social Sciences(all)
- Library and Information Sciences
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
In: International Journal on Digital Libraries, Vol. 26, No. 2, 12, 23.06.2025.
Research output: Contribution to journal › Article › Research › peer review
}
TY - JOUR
T1 - Retrieval-Augmented Generation of Event Collections from Web Archives and the Live Web
AU - Abdollahi, Sara
AU - Nejdl, Wolfgang
AU - Gottschalk, Simon
N1 - Publisher Copyright: © The Author(s) 2025.
PY - 2025/6/23
Y1 - 2025/6/23
N2 - Creating collections of societally impactful events is a challenging task given the sheer amount of information about such events covering a large variety of aspects and perspectives in web archives and the live web. The automatic creation of such collections from web archives typically does not live up to the high standards of web archivists, who put lots of manual effort into carefully curating collections. Furthermore, the lack of engaging presentation methods sets up a burden for any users aiming to interact effectively with event collections in order to explore an event in its entirety. Therefore, we (i) conduct expert interviews to determine the requirements for building and utilising event collections from the perspectives of web archivists, (ii) introduce EventExplorer – a retrieval-augmented generation (RAG) approach to create event collections through efficient retrieval and diversified ranking – and make it available in an interactive web system, (iii) apply EventExplorer on different sources including a web archive and the live web, (iv) discuss which requirements are met by EventExplorer as well as the challenges that remain for future work, with a specific emphasis on the distinctive characteristics of both archived web and the live web environments. We demonstrate the effectiveness of EventExplorer applied on web archives through a user study of our interactive system. Then, we transfer our lessons learned to the live web by creating event collections of 166 elections in Europe. Our evaluation results show the effectiveness of EventExplorer in addressing the requirements identified in our expert interviews. Further, we derive a set of challenges and potential future steps for bringing together the automatic creation of web archive collections and manual curation. Finally, we discuss how to make web archives ready for their use in RAG systems.
AB - Creating collections of societally impactful events is a challenging task given the sheer amount of information about such events covering a large variety of aspects and perspectives in web archives and the live web. The automatic creation of such collections from web archives typically does not live up to the high standards of web archivists, who put lots of manual effort into carefully curating collections. Furthermore, the lack of engaging presentation methods sets up a burden for any users aiming to interact effectively with event collections in order to explore an event in its entirety. Therefore, we (i) conduct expert interviews to determine the requirements for building and utilising event collections from the perspectives of web archivists, (ii) introduce EventExplorer – a retrieval-augmented generation (RAG) approach to create event collections through efficient retrieval and diversified ranking – and make it available in an interactive web system, (iii) apply EventExplorer on different sources including a web archive and the live web, (iv) discuss which requirements are met by EventExplorer as well as the challenges that remain for future work, with a specific emphasis on the distinctive characteristics of both archived web and the live web environments. We demonstrate the effectiveness of EventExplorer applied on web archives through a user study of our interactive system. Then, we transfer our lessons learned to the live web by creating event collections of 166 elections in Europe. Our evaluation results show the effectiveness of EventExplorer in addressing the requirements identified in our expert interviews. Further, we derive a set of challenges and potential future steps for bringing together the automatic creation of web archive collections and manual curation. Finally, we discuss how to make web archives ready for their use in RAG systems.
KW - Event collections
KW - Knowledge graph
KW - Retrieval-augmented generation
KW - Web archives
UR - http://www.scopus.com/inward/record.url?scp=105008970912&partnerID=8YFLogxK
U2 - 10.1007/s00799-025-00419-7
DO - 10.1007/s00799-025-00419-7
M3 - Article
AN - SCOPUS:105008970912
VL - 26
JO - International Journal on Digital Libraries
JF - International Journal on Digital Libraries
SN - 1432-5012
IS - 2
M1 - 12
ER -