Bram Vanroy


2022

pdf
LeConTra: A Learner Corpus of English-to-Dutch News Translation
Bram Vanroy | Lieve Macken
Proceedings of the Thirteenth Language Resources and Evaluation Conference

We present LeConTra, a learner corpus consisting of English-to-Dutch news translations enriched with translation process data. Three students of a Master’s programme in Translation were asked to translate 50 different English journalistic texts of approximately 250 tokens each. Because we also collected translation process data in the form of keystroke logging, our dataset can be used as part of different research strands such as translation process research, learner corpus research, and corpus-based translation studies. Reference translations, without process data, are also included. The data has been manually segmented and tokenized, and manually aligned at both segment and word level, leading to a high-quality corpus with token-level process data. The data is freely accessible via the Translation Process Research DataBase, which emphasises our commitment of distributing our dataset. The tool that was built for manual sentence segmentation and tokenization, Mantis, is also available as an open-source aid for data processing.

pdf
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation
Helena Moniz | Lieve Macken | Andrew Rufener | Loïc Barrault | Marta R. Costa-jussà | Christophe Declercq | Maarit Koponen | Ellie Kemp | Spyridon Pilos | Mikel L. Forcada | Carolina Scarton | Joachim Van den Bogaert | Joke Daems | Arda Tezcan | Bram Vanroy | Margot Fonteyne
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

pdf
Literary translation as a three-stage process: machine translation, post-editing and revision
Lieve Macken | Bram Vanroy | Luca Desmet | Arda Tezcan
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

This study focuses on English-Dutch literary translations that were created in a professional environment using an MT-enhanced workflow consisting of a three-stage process of automatic translation followed by post-editing and (mainly) monolingual revision. We compare the three successive versions of the target texts. We used different automatic metrics to measure the (dis)similarity between the consecutive versions and analyzed the linguistic characteristics of the three translation variants. Additionally, on a subset of 200 segments, we manually annotated all errors in the machine translation output and classified the different editing actions that were carried out. The results show that more editing occurred during revision than during post-editing and that the types of editing actions were different.

2020

pdf
LT3 at SemEval-2020 Task 7: Comparing Feature-Based and Transformer-Based Approaches to Detect Funny Headlines
Bram Vanroy | Sofie Labat | Olha Kaminska | Els Lefever | Veronique Hoste
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper presents two different systems for the SemEval shared task 7 on Assessing Humor in Edited News Headlines, sub-task 1, where the aim was to estimate the intensity of humor generated in edited headlines. Our first system is a feature-based machine learning system that combines different types of information (e.g. word embeddings, string similarity, part-of-speech tags, perplexity scores, named entity recognition) in a Nu Support Vector Regressor (NuSVR). The second system is a deep learning-based approach that uses the pre-trained language model RoBERTa to learn latent features in the news headlines that are useful to predict the funniness of each headline. The latter system was also our final submission to the competition and is ranked seventh among the 49 participating teams, with a root-mean-square error (RMSE) of 0.5253.

2019

pdf
Modelling word translation entropy and syntactic equivalence with machine learning
Bram Vanroy | Orphée De Clercq | Lieve Macken
Proceedings of the Second MEMENTO workshop on Modelling Parameters of Cognitive Effort in Translation Production