HeLI-based Experiments in Swiss German Dialect Identification

Tommi Jauhiainen, Heidi Jauhiainen, Krister Lindén


Abstract
In this paper we present the experiments and results by the SUKI team in the German Dialect Identification shared task of the VarDial 2018 Evaluation Campaign. Our submission using HeLI with adaptive language models obtained the best results in the shared task with a macro F1-score of 0.686, which is clearly higher than the other submitted results. Without some form of unsupervised adaptation on the test set, it might not be possible to reach as high an F1-score with the level of domain difference between the datasets of the shared task. We describe the methods used in detail, as well as some additional experiments carried out during the shared task.
Anthology ID:
W18-3929
Volume:
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)
Month:
August
Year:
2018
Address:
Santa Fe, New Mexico, USA
Editors:
Marcos Zampieri, Preslav Nakov, Nikola Ljubešić, Jörg Tiedemann, Shervin Malmasi, Ahmed Ali
Venue:
VarDial
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
254–262
Language:
URL:
https://aclanthology.org/W18-3929
DOI:
Bibkey:
Cite (ACL):
Tommi Jauhiainen, Heidi Jauhiainen, and Krister Lindén. 2018. HeLI-based Experiments in Swiss German Dialect Identification. In Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018), pages 254–262, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
Cite (Informal):
HeLI-based Experiments in Swiss German Dialect Identification (Jauhiainen et al., VarDial 2018)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-1/W18-3929.pdf