Practical Transformer-based Multilingual Text Classification

Cindy Wang, Michele Banko


Abstract
Transformer-based methods are appealing for multilingual text classification, but common research benchmarks like XNLI (Conneau et al., 2018) do not reflect the data availability and task variety of industry applications. We present an empirical comparison of transformer-based text classification models in a variety of practical monolingual and multilingual pretraining and fine-tuning settings. We evaluate these methods on two distinct tasks in five different languages. Departing from prior work, our results show that multilingual language models can outperform monolingual ones in some downstream tasks and target languages. We additionally show that practical modifications such as task- and domain-adaptive pretraining and data augmentation can improve classification performance without the need for additional labeled data.
Anthology ID:
2021.naacl-industry.16
Volume:
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers
Month:
June
Year:
2021
Address:
Online
Editors:
Young-bum Kim, Yunyao Li, Owen Rambow
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
121–129
Language:
URL:
https://aclanthology.org/2021.naacl-industry.16
DOI:
10.18653/v1/2021.naacl-industry.16
Bibkey:
Cite (ACL):
Cindy Wang and Michele Banko. 2021. Practical Transformer-based Multilingual Text Classification. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers, pages 121–129, Online. Association for Computational Linguistics.
Cite (Informal):
Practical Transformer-based Multilingual Text Classification (Wang & Banko, NAACL 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/2021.naacl-industry.16.pdf
Video:
 https://preview.aclanthology.org/nschneid-patch-2/2021.naacl-industry.16.mp4
Code
 sentropytechnologies/hateval2019-relabeled