IceBATS: An Icelandic Adaptation of the Bigger Analogy Test Set

Steinunn Rut Friðriksdóttir, Hjalti Daníelsson, Steinþór Steingrímsson, Einar Sigurdsson


Abstract
Word embedding models have become commonplace in a wide range of NLP applications. In order to train and use the best possible models, accurate evaluation is needed. For extrinsic evaluation of word embedding models, analogy evaluation sets have been shown to be a good quality estimator. We introduce an Icelandic adaptation of a large analogy dataset, BATS, evaluate it on three different word embedding models and show that our evaluation set is apt at measuring the capabilities of such models.
Anthology ID:
2022.lrec-1.449
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
4227–4234
Language:
URL:
https://aclanthology.org/2022.lrec-1.449
DOI:
Bibkey:
Cite (ACL):
Steinunn Rut Friðriksdóttir, Hjalti Daníelsson, Steinþór Steingrímsson, and Einar Sigurdsson. 2022. IceBATS: An Icelandic Adaptation of the Bigger Analogy Test Set. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 4227–4234, Marseille, France. European Language Resources Association.
Cite (Informal):
IceBATS: An Icelandic Adaptation of the Bigger Analogy Test Set (Friðriksdóttir et al., LREC 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-1/2022.lrec-1.449.pdf