Zero-Shot vs. Translation-Based Cross-Lingual Transfer: The Case of Lexical Gaps

Abteen Ebrahimi, Katharina Wense


Abstract
Cross-lingual transfer can be achieved through two main approaches: zero-shot transfer or machine translation (MT). While the former has been the dominant approach, both have been shown to be competitive. In this work, we compare the current performance and long-term viability of these methods. We leverage lexical gaps to create a multilingual question answering dataset, which provides a difficult domain for evaluation. Both approaches struggle in this setting, though zero-shot transfer performs better, as current MT outputs are not specific enough for the task. Using oracle translation offers the best performance, showing that this approach can perform well long-term, however current MT quality is a bottleneck. We also conduct an exploratory study to see if humans produce translations sufficient for the task with only general instructions. We find this to be true for the majority of translators, but not all. This indicates that while translation has the potential to outperform zero-shot approaches, creating MT models that generate accurate task-specific translations may not be straightforward.
Anthology ID:
2024.naacl-short.37
Volume:
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Kevin Duh, Helena Gomez, Steven Bethard
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
443–458
Language:
URL:
https://aclanthology.org/2024.naacl-short.37
DOI:
Bibkey:
Cite (ACL):
Abteen Ebrahimi and Katharina Wense. 2024. Zero-Shot vs. Translation-Based Cross-Lingual Transfer: The Case of Lexical Gaps. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers), pages 443–458, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Zero-Shot vs. Translation-Based Cross-Lingual Transfer: The Case of Lexical Gaps (Ebrahimi & Wense, NAACL 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-checklist/2024.naacl-short.37.pdf