Composable Sparse Fine-Tuning for Cross-Lingual Transfer

Alan Ansell; Edoardo Ponti; Anna Korhonen; Ivan Vulić

doi:10.18653/v1/2022.acl-long.125

Composable Sparse Fine-Tuning for Cross-Lingual Transfer

Alan Ansell, Edoardo Ponti, Anna Korhonen, Ivan Vulić

Abstract

Fine-tuning the entire set of parameters of a large pretrained model has become the mainstream approach for transfer learning. To increase its efficiency and prevent catastrophic forgetting and interference, techniques like adapters and sparse fine-tuning have been developed. Adapters are modular, as they can be combined to adapt a model towards different facets of knowledge (e.g., dedicated language and/or task adapters). Sparse fine-tuning is expressive, as it controls the behavior of all model components. In this work, we introduce a new fine-tuning method with both these desirable properties. In particular, we learn sparse, real-valued masks based on a simple variant of the Lottery Ticket Hypothesis. Task-specific masks are obtained from annotated data in a source language, and language-specific masks from masked language modeling in a target language. Both these masks can then be composed with the pretrained model. Unlike adapter-based fine-tuning, this method neither increases the number of parameters at inference time nor alters the original model architecture. Most importantly, it outperforms adapters in zero-shot cross-lingual transfer by a large margin in a series of multilingual benchmarks, including Universal Dependencies, MasakhaNER, and AmericasNLI. Based on an in-depth analysis, we additionally find that sparsity is crucial to prevent both 1) interference between the fine-tunings to be composed and 2) overfitting. We release the code and models at https://github.com/cambridgeltl/composable-sft.

Anthology ID:: 2022.acl-long.125
Volume:: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: May
Year:: 2022
Address:: Dublin, Ireland
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1778–1796
Language:
URL:: https://aclanthology.org/2022.acl-long.125
DOI:: 10.18653/v1/2022.acl-long.125
Bibkey:
Cite (ACL):: Alan Ansell, Edoardo Ponti, Anna Korhonen, and Ivan Vulić. 2022. Composable Sparse Fine-Tuning for Cross-Lingual Transfer. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1778–1796, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):: Composable Sparse Fine-Tuning for Cross-Lingual Transfer (Ansell et al., ACL 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/nodalida-main-page/2022.acl-long.125.pdf
Video:: https://preview.aclanthology.org/nodalida-main-page/2022.acl-long.125.mp4
Code: cambridgeltl/composable-sft + additional community code
Data: CoNLL-2003, GLUE, MLQA, MasakhaNER, MultiNLI, SQuAD, XQuAD

PDF Search Code Video