An Extreme Multi-label Text Classification (XMTC) Library Dataset: What If We Took "Use of Practical AI in Digital Libraries" Seriously?
Jennifer D'Souza, Sameer Sadruddin, Maximilian Kaehler, Andrea Salfinger, Luca Zaccagna, Francesca Incitti, Lauro Snidaro, Osma Suominen
Abstract
Subject indexing is vital for discovery but hard to sustain at scale and across languages. We release a large bilingual (English/German) corpus of catalog records annotated with the Integrated Authority File (GND), plus a machine-actionable GND taxonomy. The resource enables ontology-aware multi-label classification, mapping text to authority terms, and agent-assisted cataloging with reproducible, authority-grounded evaluation. We provide a brief statistical profile and qualitative error analyses of three systems. We invite the community to assess not only accuracy but usefulness and transparency, toward authority-anchored AI co-pilots that amplify catalogers’ work.- Anthology ID:
- 2026.lrec-main.12
- Volume:
- Proceedings of the Fifteenth Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2026
- Address:
- Palma de Mallorca, Spain
- Editors:
- Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
- Venue:
- LREC
- SIG:
- Publisher:
- ELRA Language Resource Association
- Note:
- Pages:
- 169–184
- Language:
- URL:
- https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.12/
- DOI:
- Cite (ACL):
- Jennifer D'Souza, Sameer Sadruddin, Maximilian Kaehler, Andrea Salfinger, Luca Zaccagna, Francesca Incitti, Lauro Snidaro, and Osma Suominen. 2026. An Extreme Multi-label Text Classification (XMTC) Library Dataset: What If We Took "Use of Practical AI in Digital Libraries" Seriously?. International Conference on Language Resources and Evaluation, main:169–184.
- Cite (Informal):
- An Extreme Multi-label Text Classification (XMTC) Library Dataset: What If We Took “Use of Practical AI in Digital Libraries” Seriously? (D’Souza et al., LREC 2026)
- PDF:
- https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.12.pdf