Yuval Gorodissky
2025
Cross-Lingual Extractive Question Answering with Unanswerable Questions
Yuval Gorodissky
|
Elior Sulem
|
Dan Roth
Proceedings of the 14th Joint Conference on Lexical and Computational Semantics (*SEM 2025)
Cross-lingual Extractive Question Answering (EQA) extends standard EQA by requiring models to find answers in passages written in languages different from the questions. The Generalized Cross-Lingual Transfer (G-XLT) task evaluates models’ zero-shot ability to transfer question answering capabilities across languages using only English training data. While previous research has primarily focused on scenarios where answers are always present, real-world applications often encounter situations where no answer exists within the given context. This paper introduces an enhanced G-XLT task definition that explicitly handles unanswerable questions, bridging a critical gap in current research. To address this challenge, we present two new datasets: miXQuAD and MLQA-IDK, which address both answerable and unanswerable questions and respectively cover 12 and 7 language pairs. Our study evaluates state-of-the-art large language models using fine-tuning, parameter-efficient techniques, and in-context learning approaches, revealing interesting trade-offs between a smaller fine-tuned model’s performance on answerable questions versus a larger in-context learning model’s capability on unanswerable questions. We also examine language similarity patterns based on model performance, finding alignments with known language families.