GECO-MT: The Ghent Eye-tracking Corpus of Machine Translation

Toon Colman, Margot Fonteyne, Joke Daems, Nicolas Dirix, Lieve Macken


Abstract
In the present paper, we describe a large corpus of eye movement data, collected during natural reading of a human translation and a machine translation of a full novel. This data set, called GECO-MT (Ghent Eye tracking Corpus of Machine Translation) expands upon an earlier corpus called GECO (Ghent Eye-tracking Corpus) by Cop et al. (2017). The eye movement data in GECO-MT will be used in future research to investigate the effect of machine translation on the reading process and the effects of various error types on reading. In this article, we describe in detail the materials and data collection procedure of GECO-MT. Extensive information on the language proficiency of our participants is given, as well as a comparison with the participants of the original GECO. We investigate the distribution of a selection of important eye movement variables and explore the possibilities for future analyses of the data. GECO-MT is freely available at https://www.lt3.ugent.be/resources/geco-mt.
Anthology ID:
2022.lrec-1.4
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
29–38
Language:
URL:
https://aclanthology.org/2022.lrec-1.4
DOI:
Bibkey:
Cite (ACL):
Toon Colman, Margot Fonteyne, Joke Daems, Nicolas Dirix, and Lieve Macken. 2022. GECO-MT: The Ghent Eye-tracking Corpus of Machine Translation. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 29–38, Marseille, France. European Language Resources Association.
Cite (Informal):
GECO-MT: The Ghent Eye-tracking Corpus of Machine Translation (Colman et al., LREC 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/auto-file-uploads/2022.lrec-1.4.pdf