Unifying Cross-Lingual Transfer across Scenarios of Resource Scarcity
Alan Ansell, Marinela Parović, Ivan Vulić, Anna Korhonen, Edoardo Ponti
Abstract
The scarcity of data in many of the world’s languages necessitates the transfer of knowledge from other, resource-rich languages. However, the level of scarcity varies significantly across multiple dimensions, including: i) the amount of task-specific data available in the source and target languages; ii) the amount of monolingual and parallel data available for both languages; and iii) the extent to which they are supported by pretrained multilingual and translation models. Prior work has largely treated these dimensions and the various techniques for dealing with them separately; in this paper, we offer a more integrated view by exploring how to deploy the arsenal of cross-lingual transfer tools across a range of scenarios, especially the most challenging, low-resource ones. To this end, we run experiments on the AmericasNLI and NusaX benchmarks over 20 languages, simulating a range of few-shot settings. The best configuration in our experiments employed parameter-efficient language and task adaptation of massively multilingual Transformers, trained simultaneously on source language data and both machine-translated and natural data for multiple target languages. In addition, we show that pre-trained translation models can be easily adapted to unseen languages, thus extending the range of our hybrid technique and translation-based transfer more broadly. Beyond new insights into the mechanisms of cross-lingual transfer, we hope our work will provide practitioners with a toolbox to integrate multiple techniques for different real-world scenarios. Our code is available at https://github.com/parovicm/unified-xlt.- Anthology ID:
- 2023.emnlp-main.242
- Volume:
- Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Houda Bouamor, Juan Pino, Kalika Bali
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 3980–3995
- Language:
- URL:
- https://aclanthology.org/2023.emnlp-main.242
- DOI:
- 10.18653/v1/2023.emnlp-main.242
- Cite (ACL):
- Alan Ansell, Marinela Parović, Ivan Vulić, Anna Korhonen, and Edoardo Ponti. 2023. Unifying Cross-Lingual Transfer across Scenarios of Resource Scarcity. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 3980–3995, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Unifying Cross-Lingual Transfer across Scenarios of Resource Scarcity (Ansell et al., EMNLP 2023)
- PDF:
- https://preview.aclanthology.org/naacl-24-ws-corrections/2023.emnlp-main.242.pdf