Christine Maroti


2023

pdf
Quality Fit for Purpose: Building Business Critical Errors Test Suites
Mariana Cabeça | Marianna Buchicchio | Madalena Gonçalves | Christine Maroti | João Godinho | Pedro Coelho | Helena Moniz | Alon Lavie
Proceedings of the 24th Annual Conference of the European Association for Machine Translation

This paper illustrates a new methodology based on Test Suites (Avramidis et al., 2018) with focus on Business Critical Errors (BCEs) (Stewart et al., 2022) to evaluate the output of Machine Translation (MT) and Quality Estimation (QE) systems. We demonstrate the value of relying on semi-automatic evaluation done through scalable BCE-focused Test Suites to monitor both MT and QE systems’ performance for 8 language pairs (LPs) and a total of 4 error categories. This approach allows us to not only track the impact of new features and implementations in a real business environment, but also to identify strengths and weaknesses in models regarding different error types, and subsequently know what to improve henceforth.

2022

pdf
CometKiwi: IST-Unbabel 2022 Submission for the Quality Estimation Shared Task
Ricardo Rei | Marcos Treviso | Nuno M. Guerreiro | Chrysoula Zerva | Ana C Farinha | Christine Maroti | José G. C. de Souza | Taisiya Glushkova | Duarte Alves | Luisa Coheur | Alon Lavie | André F. T. Martins
Proceedings of the Seventh Conference on Machine Translation (WMT)

We present the joint contribution of IST and Unbabel to the WMT 2022 Shared Task on Quality Estimation (QE). Our team participated in all three subtasks: (i) Sentence and Word-level Quality Prediction; (ii) Explainable QE; and (iii) Critical Error Detection. For all tasks we build on top of the COMET framework, connecting it with the predictor-estimator architecture of OpenKiwi, and equipping it with a word-level sequence tagger and an explanation extractor. Our results suggest that incorporating references during pretraining improves performance across several language pairs on downstream tasks, and that jointly training with sentence and word-level objectives yields a further boost. Furthermore, combining attention and gradient information proved to be the top strategy for extracting good explanations of sentence-level QE models. Overall, our submissions achieved the best results for all three tasks for almost all language pairs by a considerable margin.

2020

pdf
A Post-Editing Dataset in the Legal Domain: Do we Underestimate Neural Machine Translation Quality?
Julia Ive | Lucia Specia | Sara Szoc | Tom Vanallemeersch | Joachim Van den Bogaert | Eduardo Farah | Christine Maroti | Artur Ventura | Maxim Khalilov
Proceedings of the Twelfth Language Resources and Evaluation Conference

We introduce a machine translation dataset for three pairs of languages in the legal domain with post-edited high-quality neural machine translation and independent human references. The data was collected as part of the EU APE-QUEST project and comprises crawled content from EU websites with translation from English into three European languages: Dutch, French and Portuguese. Altogether, the data consists of around 31K tuples including a source sentence, the respective machine translation by a neural machine translation system, a post-edited version of such translation by a professional translator, and - where available - the original reference translation crawled from parallel language websites. We describe the data collection process, provide an analysis of the resulting post-edits and benchmark the data using state-of-the-art quality estimation and automatic post-editing models. One interesting by-product of our post-editing analysis suggests that neural systems built with publicly available general domain data can provide high-quality translations, even though comparison to human references suggests that this quality is quite low. This makes our dataset a suitable candidate to test evaluation metrics. The data is freely available as an ELRC-SHARE resource.

2019

pdf
APE-QUEST
Joachim Van den Bogaert | Heidi Depraetere | Sara Szoc | Tom Vanallemeersch | Koen Van Winckel | Frederic Everaert | Lucia Specia | Julia Ive | Maxim Khalilov | Christine Maroti | Eduardo Farah | Artur Ventura
Proceedings of Machine Translation Summit XVII: Translator, Project and User Tracks