Nicolas Lazzari
2025
KE-MHISTO: Towards a Multilingual Historical Knowledge Extraction Benchmark for Addressing the Long-Tail Problem
Arianna Graciotti
|
Leonardo Piano
|
Nicolas Lazzari
|
Enrico Daga
|
Rocco Tripodi
|
Valentina Presutti
|
Livio Pompianu
Findings of the Association for Computational Linguistics: ACL 2025
Large Language Models (LLMs) face significant challenges when queried about long-tail knowledge, i.e., information that is rarely encountered during their training process. These difficulties arise due to the inherent sparsity of such data. Furthermore, LLMs often lack the ability to verify or ground their responses in authoritative sources, which can lead to plausible yet inaccurate outputs when addressing infrequent subject matter. Our work aims to investigate these phenomena by introducing KE-MHISTO, a multilingual benchmark for Entity Linking and Question Answering in the domain of historical music knowledge, available in both Italian and English. We demonstrate that KE-MHISTO provides significantly broader coverage of long-tail knowledge compared to existing alternatives. Moreover, it poses substantial challenges for state-of-the-art models. Our experiments reveal that smaller, multilingual models can achieve performance comparable to significantly larger counterparts, highlighting the potential of efficient, language-aware approaches for long-tail knowledge extraction. KE-MHISTO is available at: https://github.com/polifonia-project/KE-MHISTO.
Search
Fix author
Co-authors
- Enrico Daga 1
- Arianna Graciotti 1
- Leonardo Piano 1
- Livio Pompianu 1
- Valentina Presutti 1
- show all...