Explainable CED: A Dataset for Explainable Critical Error Detection in Machine Translation

Dahyun Jung; Sugyeong Eo; Chanjun Park; Heui-Seok Lim

doi:10.18653/v1/2024.naacl-srw.4

Explainable CED: A Dataset for Explainable Critical Error Detection in Machine Translation

Dahyun Jung, Sugyeong Eo, Chanjun Park, Heuiseok Lim

Abstract

Critical error detection (CED) in machine translation is a task that aims to detect errors that significantly distort the intended meaning. However, the existing study of CED lacks explainability due to the absence of content addressing the reasons for catastrophic errors. To address this limitation, we propose Explainable CED, a dataset that introduces the attributes of error explanation and correction regarding critical errors. Considering the advantage of reducing time costs and mitigating human annotation bias, we leverage a large language model in the data construction process. To improve the quality of the dataset and mitigate hallucination, we compare responses from the model and introduce an additional data filtering method through feedback scoring. The experiment demonstrates that the dataset appropriately reflects a consistent explanation and revision for errors, validating the reliability of the dataset.

Anthology ID:: 2024.naacl-srw.4
Volume:: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)
Month:: June
Year:: 2024
Address:: Mexico City, Mexico
Editors:: Yang (Trista) Cao, Isabel Papadimitriou, Anaelia Ovalle, Marcos Zampieri, Francis Ferraro, Swabha Swayamdipta
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 25–35
Language:
URL:: https://preview.aclanthology.org/add-emnlp-2024-awards/2024.naacl-srw.4/
DOI:: 10.18653/v1/2024.naacl-srw.4
Bibkey:
Cite (ACL):: Dahyun Jung, Sugyeong Eo, Chanjun Park, and Heuiseok Lim. 2024. Explainable CED: A Dataset for Explainable Critical Error Detection in Machine Translation. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop), pages 25–35, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):: Explainable CED: A Dataset for Explainable Critical Error Detection in Machine Translation (Jung et al., NAACL 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/add-emnlp-2024-awards/2024.naacl-srw.4.pdf

PDF Cite Search Fix data