Vilhjálmur Þorsteinsson

Also published as: Vilhjalmur Thorsteinsson


2021

pdf bib
Miðeind’s WMT 2021 Submission
Haukur Barri Símonarson | Vésteinn Snæbjarnarson | Pétur Orri Ragnarson | Haukur Jónsson | Vilhjalmur Thorsteinsson
Proceedings of the Sixth Conference on Machine Translation

We present Miðeind’s submission for the English→Icelandic and Icelandic→English subsets of the 2021 WMT news translation task. Transformer-base models are trained for translation on parallel data to generate backtranslations teratively. A pretrained mBART-25 model is then adapted for translation using parallel data as well as the last backtranslation iteration. This adapted pretrained model is then used to re-generate backtranslations, and the training of the adapted model is continued.

2019

pdf bib
A Wide-Coverage Context-Free Grammar for Icelandic and an Accompanying Parsing System
Vilhjálmur Þorsteinsson | Hulda Óladóttir | Hrafn Loftsson
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

We present an open-source, wide-coverage context-free grammar (CFG) for Icelandic, and an accompanying parsing system. The grammar has over 5,600 nonterminals, 4,600 terminals and 19,000 productions in fully expanded form, with feature agreement constraints for case, gender, number and person. The parsing system consists of an enhanced Earley-based parser and a mechanism to select best-scoring parse trees from shared packed parse forests. Our parsing system is able to parse about 90% of all sentences in articles published on the main Icelandic news websites. Preliminary evaluation with evalb shows an F-measure of 70.72% on parsed sentences. Our system demonstrates that parsing a morphologically rich language using a wide-coverage CFG can be practical.