NorQuAD: Norwegian Question Answering Dataset

Sardana Ivanova, Fredrik Andreassen, Matias Jentoft, Sondre Wold, Lilja Øvrelid


Abstract
In this paper we present NorQuAD: the first Norwegian question answering dataset for machine reading comprehension. The dataset consists of 4,752 manually created question-answer pairs. We here detail the data collection procedure and present statistics of the dataset. We also benchmark several multilingual and Norwegian monolingual language models on the dataset and compare them against human performance. The dataset will be made freely available.
Anthology ID:
2023.nodalida-1.17
Volume:
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
Month:
May
Year:
2023
Address:
Tórshavn, Faroe Islands
Editors:
Tanel Alumäe, Mark Fishel
Venue:
NoDaLiDa
SIG:
Publisher:
University of Tartu Library
Note:
Pages:
159–168
Language:
URL:
https://aclanthology.org/2023.nodalida-1.17
DOI:
Bibkey:
Cite (ACL):
Sardana Ivanova, Fredrik Andreassen, Matias Jentoft, Sondre Wold, and Lilja Øvrelid. 2023. NorQuAD: Norwegian Question Answering Dataset. In Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), pages 159–168, Tórshavn, Faroe Islands. University of Tartu Library.
Cite (Informal):
NorQuAD: Norwegian Question Answering Dataset (Ivanova et al., NoDaLiDa 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2023.nodalida-1.17.pdf