Einar Sigurðsson

Also published as: Einar Sigurdsson


2025

pdf bib
Up to Par? MT Systems Take a Shot at Sports Terminology
Einar Sigurdsson | Magnús Magnússon | Atli Jasonarson | Steinthor Steingrimsson
Proceedings of the Tenth Conference on Machine Translation

We present a submission to the WMT25 test suite subtask, focusing on the capabilities of MT systems to translate sports-related language. Although many sports attract extensive media attention and feature a rich, polysemous language, often shaped by active neologism and community-driven translations, the sports domain has received relatively little focus in MT research. In English-Icelandic automatic translations, sports-specific vocabulary often appears to be mistranslated. Our test suite is designed to test whether this observation holds merit. We evaluate 34 systems, both automatically and manually, and find that sports language poses challenges to a varying degree for all the systems.

2023

pdf bib
Generating Errors: OCR Post-Processing for Icelandic
Atli Jasonarson | Steinþór Steingrímsson | Einar Sigurðsson | Árni Magnússon | Finnur Ingimundarson
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)

We describe work on enhancing the performance of transformer-based encoder-decoder models for OCR post-correction on modern and historical Icelandic texts, where OCRed data are scarce. We trained six models, four from scratch and two fine-tuned versions of Google’s ByT5, on a combination of real data and texts populated with artificially generated errors. Our results show that the models trained from scratch, as opposed to the fine-tuned versions, benefited the most from the addition of artificially generated errors.

2022

pdf bib
IceBATS: An Icelandic Adaptation of the Bigger Analogy Test Set
Steinunn Rut Friðriksdóttir | Hjalti Daníelsson | Steinþór Steingrímsson | Einar Sigurdsson
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Word embedding models have become commonplace in a wide range of NLP applications. In order to train and use the best possible models, accurate evaluation is needed. For extrinsic evaluation of word embedding models, analogy evaluation sets have been shown to be a good quality estimator. We introduce an Icelandic adaptation of a large analogy dataset, BATS, evaluate it on three different word embedding models and show that our evaluation set is apt at measuring the capabilities of such models.