Mateusz Krubiński

2022

pdf
From COMET to COMES – Can Summary Evaluation Benefit from Translation Evaluation?
Mateusz Krubiński | Pavel Pecina
Proceedings of the 3rd Workshop on Evaluation and Comparison of NLP Systems

2021

pdf abs
Just Ask! Evaluating Machine Translation by Asking and Answering Questions
Mateusz Krubiński | Erfan Ghadery | Marie-Francine Moens | Pavel Pecina
Proceedings of the Sixth Conference on Machine Translation

In this paper, we show that automatically-generated questions and answers can be used to evaluate the quality of Machine Translation (MT) systems. Building on recent work on the evaluation of abstractive text summarization, we propose a new metric for system-level MT evaluation, compare it with other state-of-the-art solutions, and show its robustness by conducting experiments for various MT directions.

pdf abs
MTEQA at WMT21 Metrics Shared Task
Mateusz Krubiński | Erfan Ghadery | Marie-Francine Moens | Pavel Pecina
Proceedings of the Sixth Conference on Machine Translation

In this paper, we describe our submission to the WMT 2021 Metrics Shared Task. We use the automatically-generated questions and answers to evaluate the quality of Machine Translation (MT) systems. Our submission builds upon the recently proposed MTEQA framework. Experiments on WMT20 evaluation datasets show that at the system-level the MTEQA metric achieves performance comparable with other state-of-the-art solutions, while considering only a certain amount of information from the whole translation.

2020

This paper describes the submission to the WMT20 shared news translation task by Samsung R&D Institute Poland. We submitted systems for six language directions: English to Czech, Czech to English, English to Polish, Polish to English, English to Inuktitut and Inuktitut to English. For each, we trained a single-direction model. However, directions including English, Polish and Czech were derived from a common multilingual base, which was later fine-tuned on each particular direction. For all the translation directions, we used a similar training regime, with iterative training corpora improvement through back-translation and model ensembling. For the En → Cs direction, we additionally leveraged document-level information by re-ranking the beam output with a separate model.

Co-authors

Venues

wmt3
eval4nlp1