Generative Information Extraction from Biographical Sources

Robin Winkle, Manfred Stede, Jörn Kreutel


Abstract
Biographical sources, such as literature encyclopedias, encode knowledge about historical figures in textual form. In this paper, we address the task of consolidating structured biographical information about authors from the former German Democratic Republic into a unified database. To this end, we present a generalizable Information Extraction (IE) system based on LLM prompting. Specifically, we compare two midsized open-source models, Qwen-2.5-32B and Llama-3-70B-Instruct, investigate a range of Prompt Engineering (PE) strategies, and propose a semantic similarity-based evaluation metric for open-ended IE. Our experiments on an unpublished annotated subset of biographical texts deliver moderate precision and variable recall, highlighting both the potential and current limitations of generative IE in the Digital Humanities.
Anthology ID:
2026.latechclfl-1.30
Volume:
Proceedings of the 10th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature 2026
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Diego Alves, Yuri Bizzoni, Stefania Degaetano-Ortlieb, Anna Kazantseva, Janis Pagel, Stan Szpakowicz
Venues:
LaTeCH-CLfL | WS
SIG:
SIGHUM
Publisher:
Association for Computational Linguistics
Note:
Pages:
311–322
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.latechclfl-1.30/
DOI:
Bibkey:
Cite (ACL):
Robin Winkle, Manfred Stede, and Jörn Kreutel. 2026. Generative Information Extraction from Biographical Sources. In Proceedings of the 10th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature 2026, pages 311–322, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Generative Information Extraction from Biographical Sources (Winkle et al., LaTeCH-CLfL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.latechclfl-1.30.pdf
Supplementarymaterial:
 2026.latechclfl-1.30.SupplementaryMaterial.zip