Exploring the Power of Romanian BERT for Dialect Identification
George-Eduard Zaharia, Andrei-Marius Avram, Dumitru-Clementin Cercel, Traian Rebedea
Abstract
Dialect identification represents a key aspect for improving a series of tasks, for example, opinion mining, considering that the location of the speaker can greatly influence the attitude towards a subject. In this work, we describe the systems developed by our team for VarDial 2020: Romanian Dialect Identification, a task specifically created for challenging participants to solve the previously mentioned issue. More specifically, we introduce a series of neural systems based on Transformers, that combine a BERT model exclusively pre-trained on the Romanian language with techniques such as adversarial training or character-level embeddings. By using these approaches, we were able to obtain a 0.6475 macro F1 score on the test dataset, thus allowing us to be ranked 5th out of 8 participant teams.- Anthology ID:
- 2020.vardial-1.22
- Volume:
- Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects
- Month:
- December
- Year:
- 2020
- Address:
- Barcelona, Spain (Online)
- Editors:
- Marcos Zampieri, Preslav Nakov, Nikola Ljubešić, Jörg Tiedemann, Yves Scherrer
- Venue:
- VarDial
- SIG:
- Publisher:
- International Committee on Computational Linguistics (ICCL)
- Note:
- Pages:
- 232–241
- Language:
- URL:
- https://aclanthology.org/2020.vardial-1.22
- DOI:
- Cite (ACL):
- George-Eduard Zaharia, Andrei-Marius Avram, Dumitru-Clementin Cercel, and Traian Rebedea. 2020. Exploring the Power of Romanian BERT for Dialect Identification. In Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects, pages 232–241, Barcelona, Spain (Online). International Committee on Computational Linguistics (ICCL).
- Cite (Informal):
- Exploring the Power of Romanian BERT for Dialect Identification (Zaharia et al., VarDial 2020)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-3/2020.vardial-1.22.pdf
- Data
- MOROCO, RONEC