Experimental comparison of MT evaluation methods: RED vs.BLEU

Yasuhiro Akiba, Eiichiro Sumita, Hiromi Nakaiwa, Seiichi Yamamoto, Hiroshi G. Okuno


Abstract
This paper experimentally compares two automatic evaluators, RED and BLEU, to determine how close the evaluation results of each automatic evaluator are to average evaluation results by human evaluators, following the ATR standard of MT evaluation. This paper gives several cautionary remarks intended to prevent MT developers from drawing misleading conclusions when using the automatic evaluators. In addition, this paper reports a way of using the automatic evaluators so that their results agree with those of human evaluators.
Anthology ID:
2003.mtsummit-papers.1
Volume:
Proceedings of Machine Translation Summit IX: Papers
Month:
September 23-27
Year:
2003
Address:
New Orleans, USA
Venue:
MTSummit
SIG:
Publisher:
Note:
Pages:
Language:
URL:
https://aclanthology.org/2003.mtsummit-papers.1
DOI:
Bibkey:
Cite (ACL):
Yasuhiro Akiba, Eiichiro Sumita, Hiromi Nakaiwa, Seiichi Yamamoto, and Hiroshi G. Okuno. 2003. Experimental comparison of MT evaluation methods: RED vs.BLEU. In Proceedings of Machine Translation Summit IX: Papers, New Orleans, USA.
Cite (Informal):
Experimental comparison of MT evaluation methods: RED vs.BLEU (Akiba et al., MTSummit 2003)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2003.mtsummit-papers.1.pdf