Andrea Salfinger
2026
An Extreme Multi-label Text Classification (XMTC) Library Dataset: What If We Took "Use of Practical AI in Digital Libraries" Seriously?
Jennifer D'Souza | Sameer Sadruddin | Maximilian Kaehler | Andrea Salfinger | Luca Zaccagna | Francesca Incitti | Lauro Snidaro | Osma Suominen
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Jennifer D'Souza | Sameer Sadruddin | Maximilian Kaehler | Andrea Salfinger | Luca Zaccagna | Francesca Incitti | Lauro Snidaro | Osma Suominen
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Subject indexing is vital for discovery but hard to sustain at scale and across languages. We release a large bilingual (English/German) corpus of catalog records annotated with the Integrated Authority File (GND), plus a machine-actionable GND taxonomy. The resource enables ontology-aware multi-label classification, mapping text to authority terms, and agent-assisted cataloging with reproducible, authority-grounded evaluation. We provide a brief statistical profile and qualitative error analyses of three systems. We invite the community to assess not only accuracy but usefulness and transparency, toward authority-anchored AI co-pilots that amplify catalogers’ work.
2025
LA²I²F at SemEval-2025 Task 5: Reasoning in Embedding Space – Fusing Analogical and Ontology-based Reasoning for Document Subject Tagging
Andrea Salfinger | Luca Zaccagna | Francesca Incitti | Gianluca De Nardi | Lorenzo Dal Fabbro | Lauro Snidaro
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Andrea Salfinger | Luca Zaccagna | Francesca Incitti | Gianluca De Nardi | Lorenzo Dal Fabbro | Lauro Snidaro
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
The LLMs4Subjects shared task invited system contributions that leverage a technical library’s tagged document corpus to learn document subject tagging, i.e., proposing adequate subjects given a document’s title and abstract. To address the imbalance of this training corpus, team LA²I²F devised a semantic retrieval-based system fusing the results of ontological and analogical reasoning in embedding vector space. Our results outperformed a naive baseline of prompting a llama 3.1-based model, whilst being computationally more efficient and competitive with the state of the art.