One For All & All For One: Bypassing Hyperparameter Tuning with Model Averaging for Cross-Lingual Transfer

Fabian David Schmidt; Ivan Vulić; Goran Glavaš

doi:10.18653/v1/2023.findings-emnlp.815

One For All & All For One: Bypassing Hyperparameter Tuning with Model Averaging for Cross-Lingual Transfer

Fabian David Schmidt, Ivan Vulić, Goran Glavaš

Abstract

Multilingual language models enable zero-shot cross-lingual transfer (ZS-XLT): fine-tuned on sizable source-language task data, they perform the task in target languages without labeled instances. The effectiveness of ZS-XLT hinges on the linguistic proximity between languages and the amount of pretraining data for a language. Because of this, model selection based on source-language validation is unreliable: it picks model snapshots with suboptimal target-language performance. As a remedy, some work optimizes ZS-XLT by extensively tuning hyperparameters: the follow-up work then routinely struggles to replicate the original results. Other work searches over narrower hyperparameter grids, reporting substantially lower performance. In this work, we therefore propose an unsupervised evaluation protocol for ZS-XLT that decouples performance maximization from hyperparameter tuning. As a robust and more transparent alternative to extensive hyperparameter tuning, we propose to accumulatively average snapshots from different runs into a single model. We run broad ZS-XLT experiments on both higher-level semantic tasks (NLI, extractive QA) and a lower-level token classification task (NER) and find that conventional model selection based on source-language validation quickly plateaus to suboptimal ZS-XLT performance. On the other hand, our accumulative run-by-run averaging of models trained with different hyperparameters boosts ZS-XLT performance and closely correlates with “oracle” ZS-XLT, i.e., model selection based on target-language validation performance.

Anthology ID:: 2023.findings-emnlp.815
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2023
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Houda Bouamor, Juan Pino, Kalika Bali
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 12186–12193
Language:
URL:: https://aclanthology.org/2023.findings-emnlp.815
DOI:: 10.18653/v1/2023.findings-emnlp.815
Bibkey:
Cite (ACL):: Fabian David Schmidt, Ivan Vulić, and Goran Glavaš. 2023. One For All & All For One: Bypassing Hyperparameter Tuning with Model Averaging for Cross-Lingual Transfer. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 12186–12193, Singapore. Association for Computational Linguistics.
Cite (Informal):: One For All & All For One: Bypassing Hyperparameter Tuning with Model Averaging for Cross-Lingual Transfer (Schmidt et al., Findings 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/naacl24-info/2023.findings-emnlp.815.pdf

PDF Search