Discriminating between Mandarin Chinese and Swiss-German varieties using adaptive language models

Tommi Jauhiainen, Krister Lindén, Heidi Jauhiainen


Abstract
This paper describes the language identification systems used by the SUKI team in the Discriminating between the Mainland and Taiwan variation of Mandarin Chinese (DMT) and the German Dialect Identification (GDI) shared tasks which were held as part of the third VarDial Evaluation Campaign. The DMT shared task included two separate tracks, one for the simplified Chinese script and one for the traditional Chinese script. We submitted three runs on both tracks of the DMT task as well as on the GDI task. We won the traditional Chinese track using Naive Bayes with language model adaptation, came second on GDI with an adaptive version of the HeLI 2.0 method, and third on the simplified Chinese track using again the adaptive Naive Bayes.
Anthology ID:
W19-1419
Volume:
Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects
Month:
June
Year:
2019
Address:
Ann Arbor, Michigan
Editors:
Marcos Zampieri, Preslav Nakov, Shervin Malmasi, Nikola Ljubešić, Jörg Tiedemann, Ahmed Ali
Venue:
VarDial
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
178–187
Language:
URL:
https://aclanthology.org/W19-1419
DOI:
10.18653/v1/W19-1419
Bibkey:
Cite (ACL):
Tommi Jauhiainen, Krister Lindén, and Heidi Jauhiainen. 2019. Discriminating between Mandarin Chinese and Swiss-German varieties using adaptive language models. In Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects, pages 178–187, Ann Arbor, Michigan. Association for Computational Linguistics.
Cite (Informal):
Discriminating between Mandarin Chinese and Swiss-German varieties using adaptive language models (Jauhiainen et al., VarDial 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-3/W19-1419.pdf