Abstract
Quality Estimation (QE) for Machine Translation has been shown to reach relatively high accuracy in predicting sentence-level scores, relying on pretrained contextual embeddings and human-produced quality scores. However, the lack of explanations along with decisions made by end-to-end neural models makes the results difficult to interpret. Furthermore, word-level annotated datasets are rare due to the prohibitive effort required to perform this task, while they could provide interpretable signals in addition to sentence-level QE outputs. In this paper, we propose a novel QE architecture which tackles both the word-level data scarcity and the interpretability limitations of recent approaches. Sentence-level and word-level components are jointly pretrained through an attention mechanism based on synthetic data and a set of MT metrics embedded in a common space. Our approach is evaluated on the Eval4NLP 2021 shared task and our submissions reach the first position in all language pairs. The extraction of metric-to-input attention weights show that different metrics focus on different parts of the source and target text, providing strong rationales in the decision-making process of the QE model.- Anthology ID:
- 2021.eval4nlp-1.15
- Volume:
- Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems
- Month:
- November
- Year:
- 2021
- Address:
- Punta Cana, Dominican Republic
- Editors:
- Yang Gao, Steffen Eger, Wei Zhao, Piyawat Lertvittayakumjorn, Marina Fomicheva
- Venue:
- Eval4NLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 146–156
- Language:
- URL:
- https://aclanthology.org/2021.eval4nlp-1.15
- DOI:
- 10.18653/v1/2021.eval4nlp-1.15
- Cite (ACL):
- Raphael Rubino, Atsushi Fujita, and Benjamin Marie. 2021. Error Identification for Machine Translation with Metric Embedding and Attention. In Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems, pages 146–156, Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- Error Identification for Machine Translation with Metric Embedding and Attention (Rubino et al., Eval4NLP 2021)
- PDF:
- https://preview.aclanthology.org/ingest-2024-clasp/2021.eval4nlp-1.15.pdf
- Data
- OPUS