Abstract
Traditional machine translation evaluation metrics such as BLEU and WER have been widely used, but these metrics have poor correlations with human judgements because they badly represent word similarity and impose strict identity matching. In this paper, we propose some modifications to the traditional measures based on word embeddings for these two metrics. The evaluation results show that our modifications significantly improve their correlation with human judgements.- Anthology ID:
- W16-4505
- Volume:
- Proceedings of the Sixth Workshop on Hybrid Approaches to Translation (HyTra6)
- Month:
- December
- Year:
- 2016
- Address:
- Osaka, Japan
- Editors:
- Patrik Lambert, Bogdan Babych, Kurt Eberle, Rafael E. Banchs, Reinhard Rapp, Marta R. Costa-jussà
- Venue:
- HyTra
- SIG:
- Publisher:
- The COLING 2016 Organizing Committee
- Note:
- Pages:
- 33–41
- Language:
- URL:
- https://aclanthology.org/W16-4505
- DOI:
- Cite (ACL):
- Haozhou Wang and Paola Merlo. 2016. Modifications of Machine Translation Evaluation Metrics by Using Word Embeddings. In Proceedings of the Sixth Workshop on Hybrid Approaches to Translation (HyTra6), pages 33–41, Osaka, Japan. The COLING 2016 Organizing Committee.
- Cite (Informal):
- Modifications of Machine Translation Evaluation Metrics by Using Word Embeddings (Wang & Merlo, HyTra 2016)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/W16-4505.pdf