Por Qué Não Utiliser Alla Språk? Mixed Training with Gradient Optimization in Few-Shot Cross-Lingual Transfer

Haoran Xu; Kenton Murray

doi:10.18653/v1/2022.findings-naacl.157

Por Qué Não Utiliser Alla Språk? Mixed Training with Gradient Optimization in Few-Shot Cross-Lingual Transfer

Abstract

The current state-of-the-art for few-shot cross-lingual transfer learning first trains on abundant labeled data in the source language and then fine-tunes with a few examples on the target language, termed target-adapting. Though this has been demonstrated to work on a variety of tasks, in this paper we show some deficiencies of this approach and propose a one-step mixed training method that trains on both source and target data with stochastic gradient surgery, a novel gradient-level optimization. Unlike the previous studies that focus on one language at a time when target-adapting, we use one model to handle all target languages simultaneously to avoid excessively language-specific models. Moreover, we discuss the unreality of utilizing large target development sets for model selection in previous literature. We further show that our method is both development-free for target languages, and is also able to escape from overfitting issues. We conduct a large-scale experiment on 4 diverse NLP tasks across up to 48 languages. Our proposed method achieves state-of-the-art performance on all tasks and outperforms target-adapting by a large margin, especially for languages that are linguistically distant from the source language, e.g., 7.36% F1 absolute gain on average for the NER task, up to 17.60% on Punjabi.

Anthology ID:: 2022.findings-naacl.157
Volume:: Findings of the Association for Computational Linguistics: NAACL 2022
Month:: July
Year:: 2022
Address:: Seattle, United States
Editors:: Marine Carpuat, Marie-Catherine de Marneffe, Ivan Vladimir Meza Ruiz
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2043–2059
Language:
URL:: https://preview.aclanthology.org/landing_page/2022.findings-naacl.157/
DOI:: 10.18653/v1/2022.findings-naacl.157
Bibkey:
Cite (ACL):: Haoran Xu and Kenton Murray. 2022. Por Qué Não Utiliser Alla Språk? Mixed Training with Gradient Optimization in Few-Shot Cross-Lingual Transfer. In Findings of the Association for Computational Linguistics: NAACL 2022, pages 2043–2059, Seattle, United States. Association for Computational Linguistics.
Cite (Informal):: Por Qué Não Utiliser Alla Språk? Mixed Training with Gradient Optimization in Few-Shot Cross-Lingual Transfer (Xu & Murray, Findings 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2022.findings-naacl.157.pdf
Video:: https://preview.aclanthology.org/landing_page/2022.findings-naacl.157.mp4
Code: fe1ixxu/mixed-gradient-few-shot
Data: TyDiQA, XNLI

PDF Cite Search Code Video Fix data