An experiment in comparative evaluation: humans vs. computers

Andrei Popescu-Belis


Abstract
This paper reports results from an experiment that was aimed at comparing evaluation metrics for machine translation. Implemented as a workshop at a major conference in 2002, the experiment defined an evaluation task, description of the metrics, as well as test data consisting of human and machine translations of two texts. Several metrics, either applicable by human judges or automated, were used, and the overall results were analyzed. It appeared that most human metrics and automated metrics provided in general consistent rankings of the various candidate translations; the ranking of the human translations matched the one provided by translation professionals; and human translations were distinguished from machine translations.
Anthology ID:
2003.mtsummit-papers.41
Volume:
Proceedings of Machine Translation Summit IX: Papers
Month:
September 23-27
Year:
2003
Address:
New Orleans, USA
Venue:
MTSummit
SIG:
Publisher:
Note:
Pages:
Language:
URL:
https://aclanthology.org/2003.mtsummit-papers.41
DOI:
Bibkey:
Cite (ACL):
Andrei Popescu-Belis. 2003. An experiment in comparative evaluation: humans vs. computers. In Proceedings of Machine Translation Summit IX: Papers, New Orleans, USA.
Cite (Informal):
An experiment in comparative evaluation: humans vs. computers (Popescu-Belis, MTSummit 2003)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/2003.mtsummit-papers.41.pdf