The FAUST Corpus of Adequacy Assessments for Real-World Machine Translation Output

Daniele Pighin; Lluís Màrquez; Lluis Formiga

The FAUST Corpus of Adequacy Assessments for Real-World Machine Translation Output

Daniele Pighin, Lluís Màrquez, Lluís Formiga

Abstract

We present a corpus consisting of 11,292 real-world English to Spanish automatic translations annotated with relative (ranking) and absolute (adequate/non-adequate) quality assessments. The translation requests, collected through the popular translation portal http://reverso.net, provide a most variated sample of real-world machine translation (MT) usage, from complete sentences to units of one or two words, from well-formed to hardly intelligible texts, from technical documents to colloquial and slang snippets. In this paper, we present 1) a preliminary annotation experiment that we carried out to select the most appropriate quality criterion to be used for these data, 2) a graph-based methodology inspired by Interactive Genetic Algorithms to reduce the annotation effort, and 3) the outcomes of the full-scale annotation experiment, which result in a valuable and original resource for the analysis and characterization of MT-output quality.

Anthology ID:: L12-1181
Volume:: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:: May
Year:: 2012
Address:: Istanbul, Turkey
Editors:: Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association (ELRA)
Note:
Pages:: 29–35
Language:
URL:: http://www.lrec-conf.org/proceedings/lrec2012/pdf/370_Paper.pdf
DOI:
Bibkey:
Cite (ACL):: Daniele Pighin, Lluís Màrquez, and Lluís Formiga. 2012. The FAUST Corpus of Adequacy Assessments for Real-World Machine Translation Output. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 29–35, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):: The FAUST Corpus of Adequacy Assessments for Real-World Machine Translation Output (Pighin et al., LREC 2012)
Copy Citation:
PDF:: http://www.lrec-conf.org/proceedings/lrec2012/pdf/370_Paper.pdf

PDF Cite Search Fix data