Transfer Learning Improves French Cross-Domain Dialect Identification: NRC @ VarDial 2022

Gabriel Bernier-Colborne; Serge Léger; Cyril Goutte

Transfer Learning Improves French Cross-Domain Dialect Identification: NRC @ VarDial 2022

Gabriel Bernier-Colborne, Serge Leger, Cyril Goutte

Abstract

We describe the systems developed by the National Research Council Canada for the French Cross-Domain Dialect Identification shared task at the 2022 VarDial evaluation campaign. We evaluated two different approaches to this task: SVM and probabilistic classifiers exploiting n-grams as features, and trained from scratch on the data provided; and a pre-trained French language model, CamemBERT, that we fine-tuned on the dialect identification task. The latter method turned out to improve the macro-F1 score on the test set from 0.344 to 0.430 (25% increase), which indicates that transfer learning can be helpful for dialect identification.

Anthology ID:: 2022.vardial-1.12
Volume:: Proceedings of the Ninth Workshop on NLP for Similar Languages, Varieties and Dialects
Month:: October
Year:: 2022
Address:: Gyeongju, Republic of Korea
Editors:: Yves Scherrer, Tommi Jauhiainen, Nikola Ljubešić, Preslav Nakov, Jörg Tiedemann, Marcos Zampieri
Venue:: VarDial
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 109–118
Language:
URL:: https://aclanthology.org/2022.vardial-1.12
DOI:
Bibkey:
Cite (ACL):: Gabriel Bernier-Colborne, Serge Leger, and Cyril Goutte. 2022. Transfer Learning Improves French Cross-Domain Dialect Identification: NRC @ VarDial 2022. In Proceedings of the Ninth Workshop on NLP for Similar Languages, Varieties and Dialects, pages 109–118, Gyeongju, Republic of Korea. Association for Computational Linguistics.
Cite (Informal):: Transfer Learning Improves French Cross-Domain Dialect Identification: NRC @ VarDial 2022 (Bernier-Colborne et al., VarDial 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/nschneid-patch-3/2022.vardial-1.12.pdf

PDF Search