Natural Questions in Icelandic

Vésteinn Snæbjarnarson, Hafsteinn Einarsson


Abstract
We present the first extractive question answering (QA) dataset for Icelandic, Natural Questions in Icelandic (NQiI). Developing such datasets is important for the development and evaluation of Icelandic QA systems. It also aids in the development of QA methods that need to work for a wide range of morphologically and grammatically different languages in a multilingual setting. The dataset was created by asking contributors to come up with questions they would like to know the answer to. Later, they were tasked with finding answers to each others questions following a previously published methodology. The questions are Natural in the sense that they are real questions posed out of interest in knowing the answer. The complete dataset contains 18 thousand labeled entries of which 5,568 are directly suitable for training an extractive QA system for Icelandic. The dataset is a valuable resource for Icelandic which we demonstrate by creating and evaluating a system capable of extractive QA in Icelandic.
Anthology ID:
2022.lrec-1.477
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
4488–4496
Language:
URL:
https://aclanthology.org/2022.lrec-1.477
DOI:
Bibkey:
Cite (ACL):
Vésteinn Snæbjarnarson and Hafsteinn Einarsson. 2022. Natural Questions in Icelandic. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 4488–4496, Marseille, France. European Language Resources Association.
Cite (Informal):
Natural Questions in Icelandic (Snæbjarnarson & Einarsson, LREC 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/remove-xml-comments/2022.lrec-1.477.pdf