Maribel Montero Perez


2022

pdf
Writing in a second Language with Machine translation (WiLMa)
Margot Fonteyne | Maribel Montero Perez | Joke Daems | Lieve Macken
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

The WiLMa project aims to assess the effects of using machine translation (MT) tools on the writing processes of second language (L2) learners of varying proficiency. Particular attention is given to individual variation in learners’ tool use.

2010

pdf
Data Collection and IPR in Multilingual Parallel Corpora. Dutch Parallel Corpus
Orphée De Clercq | Maribel Montero Perez
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

After three years of work the Dutch Parallel Corpus (DPC) project has reached an end. The finalized corpus is a ten-million-word high-quality sentence-aligned bidirectional parallel corpus of Dutch, English and French, with Dutch as central language. In this paper we present the corpus and try to formulate some basic data collection principles, based on the work that was carried out for the project. Building a corpus is a difficult and time-consuming task, especially when every text sample included has to be cleared from copyrights. The DPC is balanced according to five text types (literature, journalistic texts, instructive texts, administrative texts and texts treating external communication) and four translation directions (Dutch-English, English-Dutch, Dutch-French and French-Dutch). All the text material was cleared from copyrights. The data collection process necessitated the involvement of different text providers, which resulted in drawing up four different licence agreements. Problems such as an unknown source language, copyright issues and changes to the corpus design are discussed in close detail and illustrated with examples so as to be of help to future corpus compilers.