Tommi Nieminen

2023

pdf abs
OPUS-CAT Terminology Systems for the WMT23 Terminology Shared Task
Tommi Nieminen
Proceedings of the Eighth Conference on Machine Translation

This paper describes the submission of the OPUS-CAT project to the WMT 2023 terminology shared task. We trained systems for all three language pairs included in the task. All systems were trained using the same training pipeline with identical methods. Support for terminology was implemented by using the currently popular method of annotating source language terms in the training data with the corresponding target language terms.

2021

pdf abs
OPUS-CAT: Desktop NMT with CAT integration and local fine-tuning
Tommi Nieminen
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations

OPUS-CAT is a collection of software which enables translators to use neural machine translation in computer-assisted translation tools without exposing themselves to security and confidentiality risks inherent in online machine translation. OPUS-CAT uses the public OPUS-MT machine translation models, which are available for over a thousand language pairs. The generic OPUS-MT models can be fine-tuned with OPUS-CAT on the desktop using data for a specific client or domain.

2020

This paper presents FISKMÖ, a project that focuses on the development of resources and tools for cross-linguistic research and machine translation between Finnish and Swedish. The goal of the project is the compilation of a massive parallel corpus out of translated material collected from web sources, public and private organisations and language service providers in Finland with its two official languages. The project also aims at the development of open and freely accessible translation services for those two languages for the general purpose and for domain-specific use. We have released new data sets with over 3 million translation units, a benchmark test set for MT development, pre-trained neural MT models with high coverage and competitive performance and a self-contained MT plugin for a popular CAT tool. The latter enables offline translation without dependencies on external services making it possible to work with highly sensitive data without compromising security concerns.

2018

pdf abs
The University of Helsinki submissions to the WMT18 news task
Alessandro Raganato | Yves Scherrer | Tommi Nieminen | Arvi Hurskainen | Jörg Tiedemann
Proceedings of the Third Conference on Machine Translation: Shared Task Papers

This paper describes the University of Helsinki’s submissions to the WMT18 shared news translation task for English-Finnish and English-Estonian, in both directions. This year, our main submissions employ a novel neural architecture, the Transformer, using the open-source OpenNMT framework. Our experiments couple domain labeling and fine tuned multilingual models with shared vocabularies between the source and target language, using the provided parallel data of the shared task and additional back-translations. Finally, we compare, for the English-to-Finnish case, the effectiveness of different machine translation architectures, starting from a rule-based approach to our best neural model, analyzing the output and highlighting future research.

Progress in the quality of machine translation output calls for new automatic evaluation procedures and metrics. In this paper, we extend the Morpheval protocol introduced by Burlot and Yvon (2017) for the English-to-Czech and English-to-Latvian translation directions to three additional language pairs, and report its use to analyze the results of WMT 2018’s participants for these language pairs. Considering additional, typologically varied source and target languages also enables us to draw some generalizations regarding this morphology-oriented evaluation procedure.