Improving Cross-lingual Text Classification with Zero-shot Instance-Weighting

Irene Li, Prithviraj Sen, Huaiyu Zhu, Yunyao Li, Dragomir Radev


Abstract
Cross-lingual text classification (CLTC) is a challenging task made even harder still due to the lack of labeled data in low-resource languages. In this paper, we propose zero-shot instance-weighting, a general model-agnostic zero-shot learning framework for improving CLTC by leveraging source instance weighting. It adds a module on top of pre-trained language models for similarity computation of instance weights, thus aligning each source instance to the target language. During training, the framework utilizes gradient descent that is weighted by instance weights to update parameters. We evaluate this framework over seven target languages on three fundamental tasks and show its effectiveness and extensibility, by improving on F1 score up to 4% in single-source transfer and 8% in multi-source transfer. To the best of our knowledge, our method is the first to apply instance weighting in zero-shot CLTC. It is simple yet effective and easily extensible into multi-source transfer.
Anthology ID:
2021.repl4nlp-1.1
Volume:
Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021)
Month:
August
Year:
2021
Address:
Online
Venue:
RepL4NLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–7
Language:
URL:
https://aclanthology.org/2021.repl4nlp-1.1
DOI:
10.18653/v1/2021.repl4nlp-1.1
Bibkey:
Cite (ACL):
Irene Li, Prithviraj Sen, Huaiyu Zhu, Yunyao Li, and Dragomir Radev. 2021. Improving Cross-lingual Text Classification with Zero-shot Instance-Weighting. In Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021), pages 1–7, Online. Association for Computational Linguistics.
Cite (Informal):
Improving Cross-lingual Text Classification with Zero-shot Instance-Weighting (Li et al., RepL4NLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2021.repl4nlp-1.1.pdf
Data
MLDoc