Abstract
Human knowledge is collectively encoded in the roughly 6500 languages spoken around the world, but it is not distributed equally across languages. Hence, for information-seeking question answering (QA) systems to adequately serve speakers of all languages, they need to operate cross-lingually. In this work we investigate the capabilities of multilingually pretrained language models on cross-lingual QA. We find that explicitly aligning the representations across languages with a post-hoc finetuning step generally leads to improved performance. We additionally investigate the effect of data size as well as the language choice in this fine-tuning step, also releasing a dataset for evaluating cross-lingual QA systems.- Anthology ID:
- 2021.mrqa-1.14
- Volume:
- Proceedings of the 3rd Workshop on Machine Reading for Question Answering
- Month:
- November
- Year:
- 2021
- Address:
- Punta Cana, Dominican Republic
- Editors:
- Adam Fisch, Alon Talmor, Danqi Chen, Eunsol Choi, Minjoon Seo, Patrick Lewis, Robin Jia, Sewon Min
- Venue:
- MRQA
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 133–148
- Language:
- URL:
- https://aclanthology.org/2021.mrqa-1.14
- DOI:
- 10.18653/v1/2021.mrqa-1.14
- Cite (ACL):
- Fahim Faisal and Antonios Anastasopoulos. 2021. Investigating Post-pretraining Representation Alignment for Cross-Lingual Question Answering. In Proceedings of the 3rd Workshop on Machine Reading for Question Answering, pages 133–148, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- Investigating Post-pretraining Representation Alignment for Cross-Lingual Question Answering (Faisal & Anastasopoulos, MRQA 2021)
- PDF:
- https://preview.aclanthology.org/naacl24-info/2021.mrqa-1.14.pdf
- Data
- MKQA, MLQA, SQuAD, TyDiQA, XQuAD