MoSECroT: Model Stitching with Static Word Embeddings for Crosslingual Zero-shot Transfer

Haotian Ye, Yihong Liu, Chunlan Ma, Hinrich Schütze


Abstract
Transformer-based pre-trained language models (PLMs) have achieved remarkable performance in various natural language processing (NLP) tasks. However, pre-training such models can take considerable resources that are almost only available to high-resource languages. On the contrary, static word embeddings are easier to train in terms of computing resources and the amount of data required. In this paper, we introduce MoSECroT (Model Stitching with Static Word Embeddings for Crosslingual Zero-shot Transfer, a novel and challenging task that is especially relevant to low-resource languages for which static word embeddings are available. To tackle the task, we present the first framework that leverages relative representations to construct a common space for the embeddings of a source language PLM and the static word embeddings of a target language. In this way, we can train the PLM on source-language training data and perform zero-shot transfer to the target language by simply swapping the embedding layer. However, through extensive experiments on two classification datasets, we show that although our proposed framework is competitive with weak baselines when addressing MoSECroT, it fails to achieve competitive results compared with some strong baselines. In this paper, we attempt to explain this negative result and provide several thoughts on possible improvement.
Anthology ID:
2024.insights-1.1
Volume:
Proceedings of the Fifth Workshop on Insights from Negative Results in NLP
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Shabnam Tafreshi, Arjun Akula, João Sedoc, Aleksandr Drozd, Anna Rogers, Anna Rumshisky
Venues:
insights | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–7
Language:
URL:
https://aclanthology.org/2024.insights-1.1
DOI:
Bibkey:
Cite (ACL):
Haotian Ye, Yihong Liu, Chunlan Ma, and Hinrich Schütze. 2024. MoSECroT: Model Stitching with Static Word Embeddings for Crosslingual Zero-shot Transfer. In Proceedings of the Fifth Workshop on Insights from Negative Results in NLP, pages 1–7, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
MoSECroT: Model Stitching with Static Word Embeddings for Crosslingual Zero-shot Transfer (Ye et al., insights-WS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/jeptaln-2024-ingestion/2024.insights-1.1.pdf