NoMusic - The Norwegian Multi-Dialectal Slot and Intent Detection Corpus

Petter Mæhlum, Yves Scherrer


Abstract
This paper presents a new textual resource for Norwegian and its dialects. The NoMusic corpus contains Norwegian translations of the xSID dataset, an evaluation dataset for spoken language understanding (slot and intent detection). The translations cover Norwegian Bokmål, as well as eight dialects from three of the four major Norwegian dialect areas. To our knowledge, this is the first multi-parallel resource for written Norwegian dialects, and the first evaluation dataset for slot and intent detection focusing on non-standard Norwegian varieties. In this paper, we describe the annotation process and provide some analyses on the types of linguistic variation that can be found in the dataset.
Anthology ID:
2024.vardial-1.9
Volume:
Proceedings of the Eleventh Workshop on NLP for Similar Languages, Varieties, and Dialects (VarDial 2024)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Yves Scherrer, Tommi Jauhiainen, Nikola Ljubešić, Marcos Zampieri, Preslav Nakov, Jörg Tiedemann
Venues:
VarDial | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
107–116
Language:
URL:
https://aclanthology.org/2024.vardial-1.9
DOI:
Bibkey:
Cite (ACL):
Petter Mæhlum and Yves Scherrer. 2024. NoMusic - The Norwegian Multi-Dialectal Slot and Intent Detection Corpus. In Proceedings of the Eleventh Workshop on NLP for Similar Languages, Varieties, and Dialects (VarDial 2024), pages 107–116, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
NoMusic - The Norwegian Multi-Dialectal Slot and Intent Detection Corpus (Mæhlum & Scherrer, VarDial-WS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/jeptaln-2024-ingestion/2024.vardial-1.9.pdf
Supplementary material:
 2024.vardial-1.9.SupplementaryMaterial.txt