Abstract
Automatic language identification is a challenging problem. Discriminating between closely related languages is especially difficult. This paper presents a machine learning approach for automatic language identification for the Nordic languages, which often suffer miscategorisation by existing state-of-the-art tools. Concretely we will focus on discrimination between six Nordic languages: Danish, Swedish, Norwegian (Nynorsk), Norwegian (Bokmål), Faroese and Icelandic.- Anthology ID:
- 2021.vardial-1.8
- Volume:
- Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects
- Month:
- April
- Year:
- 2021
- Address:
- Kiyv, Ukraine
- Editors:
- Marcos Zampieri, Preslav Nakov, Nikola Ljubešić, Jörg Tiedemann, Yves Scherrer, Tommi Jauhiainen
- Venue:
- VarDial
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 67–75
- Language:
- URL:
- https://aclanthology.org/2021.vardial-1.8
- DOI:
- Cite (ACL):
- René Haas and Leon Derczynski. 2021. Discriminating Between Similar Nordic Languages. In Proceedings of the Eighth Workshop on NLP for Similar Languages, Varieties and Dialects, pages 67–75, Kiyv, Ukraine. Association for Computational Linguistics.
- Cite (Informal):
- Discriminating Between Similar Nordic Languages (Haas & Derczynski, VarDial 2021)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2021.vardial-1.8.pdf
- Code
- StrombergNLP/NordicDSL + additional community code
- Data
- Nordic Language Identification