BlonDe: An Automatic Evaluation Metric for Document-level Machine Translation
Yuchen Jiang, Tianyu Liu, Shuming Ma, Dongdong Zhang, Jian Yang, Haoyang Huang, Rico Sennrich, Ryan Cotterell, Mrinmaya Sachan, Ming Zhou
Abstract
Standard automatic metrics, e.g. BLEU, are not reliable for document-level MT evaluation. They can neither distinguish document-level improvements in translation quality from sentence-level ones, nor identify the discourse phenomena that cause context-agnostic translations. This paper introduces a novel automatic metric BlonDe to widen the scope of automatic MT evaluation from sentence to document level. BlonDe takes discourse coherence into consideration by categorizing discourse-related spans and calculating the similarity-based F1 measure of categorized spans. We conduct extensive comparisons on a newly constructed dataset BWB. The experimental results show that BlonDe possesses better selectivity and interpretability at the document-level, and is more sensitive to document-level nuances. In a large-scale human study, BlonDe also achieves significantly higher Pearson’s r correlation with human judgments compared to previous metrics.- Anthology ID:
- 2022.naacl-main.111
- Volume:
- Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
- Month:
- July
- Year:
- 2022
- Address:
- Seattle, United States
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1550–1565
- Language:
- URL:
- https://aclanthology.org/2022.naacl-main.111
- DOI:
- 10.18653/v1/2022.naacl-main.111
- Cite (ACL):
- Yuchen Jiang, Tianyu Liu, Shuming Ma, Dongdong Zhang, Jian Yang, Haoyang Huang, Rico Sennrich, Ryan Cotterell, Mrinmaya Sachan, and Ming Zhou. 2022. BlonDe: An Automatic Evaluation Metric for Document-level Machine Translation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1550–1565, Seattle, United States. Association for Computational Linguistics.
- Cite (Informal):
- BlonDe: An Automatic Evaluation Metric for Document-level Machine Translation (Jiang et al., NAACL 2022)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2022.naacl-main.111.pdf
- Code
- eleanorjiang/blonde