Dialect Identification through Adversarial Learning and Knowledge Distillation on Romanian BERT
George-Eduard Zaharia, Andrei-Marius Avram, Dumitru-Clementin Cercel, Traian Rebedea
Abstract
Dialect identification is a task with applicability in a vast array of domains, ranging from automatic speech recognition to opinion mining. This work presents our architectures used for the VarDial 2021 Romanian Dialect Identification subtask. We introduced a series of solutions based on Romanian or multilingual Transformers, as well as adversarial training techniques. At the same time, we experimented with a knowledge distillation tool in order to check whether a smaller model can maintain the performance of our best approach. Our best solution managed to obtain a weighted F1-score of 0.7324, allowing us to obtain the 2nd place on the leaderboard.- Anthology ID:
- 2021.vardial-1.13
- Volume:
- Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects
- Month:
- April
- Year:
- 2021
- Address:
- Kiyv, Ukraine
- Editors:
- Marcos Zampieri, Preslav Nakov, Nikola Ljubešić, Jörg Tiedemann, Yves Scherrer, Tommi Jauhiainen
- Venue:
- VarDial
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 113–119
- Language:
- URL:
- https://aclanthology.org/2021.vardial-1.13
- DOI:
- Cite (ACL):
- George-Eduard Zaharia, Andrei-Marius Avram, Dumitru-Clementin Cercel, and Traian Rebedea. 2021. Dialect Identification through Adversarial Learning and Knowledge Distillation on Romanian BERT. In Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects, pages 113–119, Kiyv, Ukraine. Association for Computational Linguistics.
- Cite (Informal):
- Dialect Identification through Adversarial Learning and Knowledge Distillation on Romanian BERT (Zaharia et al., VarDial 2021)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-1/2021.vardial-1.13.pdf