Abstract
Research on token-level reference-free hallucination detection has predominantly focused on English, primarily due to the scarcity of robust datasets in other languages. This has hindered systematic investigations into the effectiveness of cross-lingual transfer for this important NLP application. To address this gap, we introduce ANHALTEN, a new evaluation dataset that extends the English hallucination detection dataset to German. To the best of our knowledge, this is the first work that explores cross-lingual transfer for token-level reference-free hallucination detection. ANHALTEN contains gold annotations in German that are parallel (i.e., directly comparable to the original English instances). We benchmark several prominent cross-lingual transfer approaches, demonstrating that larger context length leads to better hallucination detection in German, even without succeeding context. Importantly, we show that the sample-efficient few-shot transfer is the most effective approach in most setups. This highlights the practical benefits of minimal annotation effort in the target language for reference-free hallucination detection. Aiming to catalyze future research on cross-lingual token-level reference-free hallucination detection, we make ANHALTEN publicly available: https://github.com/janekh24/anhalten- Anthology ID:
- 2024.acl-srw.18
- Volume:
- Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
- Month:
- August
- Year:
- 2024
- Address:
- Bangkok, Thailand
- Editors:
- Xiyan Fu, Eve Fleisig
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 92–100
- Language:
- URL:
- https://aclanthology.org/2024.acl-srw.18
- DOI:
- 10.18653/v1/2024.acl-srw.18
- Cite (ACL):
- Janek Herrlein, Chia-Chien Hung, and Goran Glava�. 2024. ANHALTEN: Cross-Lingual Transfer for German Token-Level Reference-Free Hallucination Detection. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop), pages 92–100, Bangkok, Thailand. Association for Computational Linguistics.
- Cite (Informal):
- ANHALTEN: Cross-Lingual Transfer for German Token-Level Reference-Free Hallucination Detection (Herrlein et al., ACL 2024)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2024.acl-srw.18.pdf