Evaluating Open-Source LLMs for Text Summarization and Named Entity Recognition in Long, Unstructured Text

Pauline Kister, Miriam Schirmer


Abstract
This work investigates the extent to which open-source Large Language Models (LLMs) can improve accessibility of unstructured historical documents by performing abstractive summarization and fine-grained Named Entity Recognition (NER) for role classification and violation types. We evaluate open-source LLMs in zero-shot settings and apply these tasks to witness testimonies collected by the South African Truth and Reconciliation Commission (TRC), which archived a large body of text documenting human rights violations during apartheid. Despite their historical significance, these texts are difficult to access due to their length, lack of standardized structure, and the absence of systematic indexing.Open-source LLMs show strong performance in summarization, with most models surpassing non-LLM baselines (maximum BERTScore 0.77), while NER performance remains limited (maximum F1-score 0.61). Results suggest a trade-off in which stylistic fluency is prioritized over factual precision. A two-stage pipeline, summarization followed by NER on LLM summaries, leads to measurable improvements.
Anthology ID:
2026.nlp4dh-1.35
Volume:
Proceedings of the 6th International Conference on Natural Language Processing for the Digital Humanities
Month:
July
Year:
2026
Address:
San Diego, USA
Editors:
Sil Hamilton, Emily Öhman, Rebecca M. M. Hicke, Yuri Bizzoni, Axel Bax, Jacob A. Matthews, Mika Hämäläinen
Venues:
NLP4DH | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
390–410
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.nlp4dh-1.35/
DOI:
Bibkey:
Cite (ACL):
Pauline Kister and Miriam Schirmer. 2026. Evaluating Open-Source LLMs for Text Summarization and Named Entity Recognition in Long, Unstructured Text. In Proceedings of the 6th International Conference on Natural Language Processing for the Digital Humanities, pages 390–410, San Diego, USA. Association for Computational Linguistics.
Cite (Informal):
Evaluating Open-Source LLMs for Text Summarization and Named Entity Recognition in Long, Unstructured Text (Kister & Schirmer, NLP4DH 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.nlp4dh-1.35.pdf