Multilingual BERT Post-Pretraining Alignment

Lin Pan, Chung-Wei Hang, Haode Qi, Abhishek Shah, Saloni Potdar, Mo Yu


Abstract
We propose a simple method to align multilingual contextual embeddings as a post-pretraining step for improved cross-lingual transferability of the pretrained language models. Using parallel data, our method aligns embeddings on the word level through the recently proposed Translation Language Modeling objective as well as on the sentence level via contrastive learning and random input shuffling. We also perform sentence-level code-switching with English when finetuning on downstream tasks. On XNLI, our best model (initialized from mBERT) improves over mBERT by 4.7% in the zero-shot setting and achieves comparable result to XLM for translate-train while using less than 18% of the same parallel data and 31% fewer model parameters. On MLQA, our model outperforms XLM-R_Base, which has 57% more parameters than ours.
Anthology ID:
2021.naacl-main.20
Volume:
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
June
Year:
2021
Address:
Online
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
210–219
Language:
URL:
https://aclanthology.org/2021.naacl-main.20
DOI:
10.18653/v1/2021.naacl-main.20
Bibkey:
Cite (ACL):
Lin Pan, Chung-Wei Hang, Haode Qi, Abhishek Shah, Saloni Potdar, and Mo Yu. 2021. Multilingual BERT Post-Pretraining Alignment. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 210–219, Online. Association for Computational Linguistics.
Cite (Informal):
Multilingual BERT Post-Pretraining Alignment (Pan et al., NAACL 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2021.naacl-main.20.pdf
Video:
 https://preview.aclanthology.org/emnlp-22-attachments/2021.naacl-main.20.mp4
Data
ImageNetMLQAMultiNLIXNLI