Unifying Cross-Lingual Transfer across Scenarios of Resource Scarcity

Alan Ansell, Marinela Parović, Ivan Vulić, Anna Korhonen, Edoardo Ponti


Abstract
The scarcity of data in many of the world’s languages necessitates the transfer of knowledge from other, resource-rich languages. However, the level of scarcity varies significantly across multiple dimensions, including: i) the amount of task-specific data available in the source and target languages; ii) the amount of monolingual and parallel data available for both languages; and iii) the extent to which they are supported by pretrained multilingual and translation models. Prior work has largely treated these dimensions and the various techniques for dealing with them separately; in this paper, we offer a more integrated view by exploring how to deploy the arsenal of cross-lingual transfer tools across a range of scenarios, especially the most challenging, low-resource ones. To this end, we run experiments on the AmericasNLI and NusaX benchmarks over 20 languages, simulating a range of few-shot settings. The best configuration in our experiments employed parameter-efficient language and task adaptation of massively multilingual Transformers, trained simultaneously on source language data and both machine-translated and natural data for multiple target languages. In addition, we show that pre-trained translation models can be easily adapted to unseen languages, thus extending the range of our hybrid technique and translation-based transfer more broadly. Beyond new insights into the mechanisms of cross-lingual transfer, we hope our work will provide practitioners with a toolbox to integrate multiple techniques for different real-world scenarios. Our code is available at https://github.com/parovicm/unified-xlt.
Anthology ID:
2023.emnlp-main.242
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3980–3995
Language:
URL:
https://aclanthology.org/2023.emnlp-main.242
DOI:
10.18653/v1/2023.emnlp-main.242
Bibkey:
Cite (ACL):
Alan Ansell, Marinela Parović, Ivan Vulić, Anna Korhonen, and Edoardo Ponti. 2023. Unifying Cross-Lingual Transfer across Scenarios of Resource Scarcity. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 3980–3995, Singapore. Association for Computational Linguistics.
Cite (Informal):
Unifying Cross-Lingual Transfer across Scenarios of Resource Scarcity (Ansell et al., EMNLP 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl-24-ws-corrections/2023.emnlp-main.242.pdf
Video:
 https://preview.aclanthology.org/naacl-24-ws-corrections/2023.emnlp-main.242.mp4