Experimental comparison of MT evaluation methods: RED vs.BLEU
Yasuhiro Akiba, Eiichiro Sumita, Hiromi Nakaiwa, Seiichi Yamamoto, Hiroshi G. Okuno
Abstract
This paper experimentally compares two automatic evaluators, RED and BLEU, to determine how close the evaluation results of each automatic evaluator are to average evaluation results by human evaluators, following the ATR standard of MT evaluation. This paper gives several cautionary remarks intended to prevent MT developers from drawing misleading conclusions when using the automatic evaluators. In addition, this paper reports a way of using the automatic evaluators so that their results agree with those of human evaluators.- Anthology ID:
- 2003.mtsummit-papers.1
- Volume:
- Proceedings of Machine Translation Summit IX: Papers
- Month:
- September 23-27
- Year:
- 2003
- Address:
- New Orleans, USA
- Venue:
- MTSummit
- SIG:
- Publisher:
- Note:
- Pages:
- Language:
- URL:
- https://aclanthology.org/2003.mtsummit-papers.1
- DOI:
- Cite (ACL):
- Yasuhiro Akiba, Eiichiro Sumita, Hiromi Nakaiwa, Seiichi Yamamoto, and Hiroshi G. Okuno. 2003. Experimental comparison of MT evaluation methods: RED vs.BLEU. In Proceedings of Machine Translation Summit IX: Papers, New Orleans, USA.
- Cite (Informal):
- Experimental comparison of MT evaluation methods: RED vs.BLEU (Akiba et al., MTSummit 2003)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2003.mtsummit-papers.1.pdf