Artur Ventura


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2020

pdf bib
A Post-Editing Dataset in the Legal Domain: Do we Underestimate Neural Machine Translation Quality?
Julia Ive | Lucia Specia | Sara Szoc | Tom Vanallemeersch | Joachim Van den Bogaert | Eduardo Farah | Christine Maroti | Artur Ventura | Maxim Khalilov
Proceedings of the Twelfth Language Resources and Evaluation Conference

We introduce a machine translation dataset for three pairs of languages in the legal domain with post-edited high-quality neural machine translation and independent human references. The data was collected as part of the EU APE-QUEST project and comprises crawled content from EU websites with translation from English into three European languages: Dutch, French and Portuguese. Altogether, the data consists of around 31K tuples including a source sentence, the respective machine translation by a neural machine translation system, a post-edited version of such translation by a professional translator, and - where available - the original reference translation crawled from parallel language websites. We describe the data collection process, provide an analysis of the resulting post-edits and benchmark the data using state-of-the-art quality estimation and automatic post-editing models. One interesting by-product of our post-editing analysis suggests that neural systems built with publicly available general domain data can provide high-quality translations, even though comparison to human references suggests that this quality is quite low. This makes our dataset a suitable candidate to test evaluation metrics. The data is freely available as an ELRC-SHARE resource.

2019

pdf bib
APE-QUEST
Joachim Van den Bogaert | Heidi Depraetere | Sara Szoc | Tom Vanallemeersch | Koen Van Winckel | Frederic Everaert | Lucia Specia | Julia Ive | Maxim Khalilov | Christine Maroti | Eduardo Farah | Artur Ventura
Proceedings of Machine Translation Summit XVII: Translator, Project and User Tracks