Analyzing and Reducing the Performance Gap in Cross-Lingual Transfer with Fine-tuning Slow and Fast

Yiduo Guo, Yaobo Liang, Dongyan Zhao, Bing Liu, Nan Duan


Abstract
Existing research has shown that a multilingual pre-trained language model fine-tuned with one (source) language also performs well on downstream tasks for non-source languages, even though no fine-tuning is done on these languages. However, there is a clear gap between the performance of the source language and that of the non-source languages. This paper analyzes the fine-tuning process, discovers when the performance gap changes and identifies which network weights affect the overall performance most. Additionally, the paper seeks to answer to what extent the gap can be reduced by reducing forgetting. Based on the analysis results, a method named Fine-tuning slow and fast with four training policies is proposed to address these issues. Experimental results show the proposed method outperforms baselines by a clear margin.
Anthology ID:
2023.acl-long.221
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4002–4017
Language:
URL:
https://aclanthology.org/2023.acl-long.221
DOI:
10.18653/v1/2023.acl-long.221
Bibkey:
Cite (ACL):
Yiduo Guo, Yaobo Liang, Dongyan Zhao, Bing Liu, and Nan Duan. 2023. Analyzing and Reducing the Performance Gap in Cross-Lingual Transfer with Fine-tuning Slow and Fast. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4002–4017, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Analyzing and Reducing the Performance Gap in Cross-Lingual Transfer with Fine-tuning Slow and Fast (Guo et al., ACL 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/improve-issue-templates/2023.acl-long.221.pdf