The Eval4NLP Shared Task on Explainable Quality Estimation: Overview and Results

Marina Fomicheva, Piyawat Lertvittayakumjorn, Wei Zhao, Steffen Eger, Yang Gao


Abstract
In this paper, we introduce the Eval4NLP-2021 shared task on explainable quality estimation. Given a source-translation pair, this shared task requires not only to provide a sentence-level score indicating the overall quality of the translation, but also to explain this score by identifying the words that negatively impact translation quality. We present the data, annotation guidelines and evaluation setup of the shared task, describe the six participating systems, and analyze the results. To the best of our knowledge, this is the first shared task on explainable NLP evaluation metrics. Datasets and results are available at https://github.com/eval4nlp/SharedTask2021.
Anthology ID:
2021.eval4nlp-1.17
Volume:
Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Venue:
Eval4NLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
165–178
Language:
URL:
https://aclanthology.org/2021.eval4nlp-1.17
DOI:
10.18653/v1/2021.eval4nlp-1.17
Bibkey:
Cite (ACL):
Marina Fomicheva, Piyawat Lertvittayakumjorn, Wei Zhao, Steffen Eger, and Yang Gao. 2021. The Eval4NLP Shared Task on Explainable Quality Estimation: Overview and Results. In Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems, pages 165–178, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
The Eval4NLP Shared Task on Explainable Quality Estimation: Overview and Results (Fomicheva et al., Eval4NLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/starsem-semeval-split/2021.eval4nlp-1.17.pdf
Code
 eval4nlp/sharedtask2021
Data
MLQE-PE