AskQE: Question Answering as Automatic Evaluation for Machine Translation

Dayeon Ki, Kevin Duh, Marine Carpuat


Abstract
How can a monolingual English speaker determine whether an automatic translation in French is good enough to be shared? Existing MT error detection and quality estimation (QE) techniques do not address this practical scenario. We introduce AskQE, a question generation and answering framework designed to detect critical MT errors and provide actionable feedback, helping users decide whether to accept or reject MT outputs even without the knowledge of the target language. Using ContraTICO, a dataset of contrastive synthetic MT errors in the COVID-19 domain, we explore design choices for AskQE and develop an optimized version relying on LLaMA-3 70B and entailed facts to guide question generation. We evaluate the resulting system on the BioMQM dataset of naturally occurring MT errors, where AskQE has higher Kendall’s Tau correlation and decision accuracy with human ratings compared to other QE metrics.
Anthology ID:
2025.findings-acl.899
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venues:
Findings | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
17478–17515
Language:
URL:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.findings-acl.899/
DOI:
Bibkey:
Cite (ACL):
Dayeon Ki, Kevin Duh, and Marine Carpuat. 2025. AskQE: Question Answering as Automatic Evaluation for Machine Translation. In Findings of the Association for Computational Linguistics: ACL 2025, pages 17478–17515, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
AskQE: Question Answering as Automatic Evaluation for Machine Translation (Ki et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.findings-acl.899.pdf