Dialect Identification through Adversarial Learning and Knowledge Distillation on Romanian BERT

George-Eduard Zaharia, Andrei-Marius Avram, Dumitru-Clementin Cercel, Traian Rebedea


Abstract
Dialect identification is a task with applicability in a vast array of domains, ranging from automatic speech recognition to opinion mining. This work presents our architectures used for the VarDial 2021 Romanian Dialect Identification subtask. We introduced a series of solutions based on Romanian or multilingual Transformers, as well as adversarial training techniques. At the same time, we experimented with a knowledge distillation tool in order to check whether a smaller model can maintain the performance of our best approach. Our best solution managed to obtain a weighted F1-score of 0.7324, allowing us to obtain the 2nd place on the leaderboard.
Anthology ID:
2021.vardial-1.13
Volume:
Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects
Month:
April
Year:
2021
Address:
Kiyv, Ukraine
Editors:
Marcos Zampieri, Preslav Nakov, Nikola Ljubešić, Jörg Tiedemann, Yves Scherrer, Tommi Jauhiainen
Venue:
VarDial
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
113–119
Language:
URL:
https://aclanthology.org/2021.vardial-1.13
DOI:
Bibkey:
Cite (ACL):
George-Eduard Zaharia, Andrei-Marius Avram, Dumitru-Clementin Cercel, and Traian Rebedea. 2021. Dialect Identification through Adversarial Learning and Knowledge Distillation on Romanian BERT. In Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects, pages 113–119, Kiyv, Ukraine. Association for Computational Linguistics.
Cite (Informal):
Dialect Identification through Adversarial Learning and Knowledge Distillation on Romanian BERT (Zaharia et al., VarDial 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-2024-clasp/2021.vardial-1.13.pdf