Comparison Between ATA Grading Framework Scores and Auto Scores
Abstract
The authors of this study compared two types of translation quality scores assigned to the same sets of translation samples: 1) the ATA Grading Framework scores assigned by human experts, and 2) auto scores, including BLEU, TER, and COMET (with and without reference). They further explored the impact of different reference translations on the auto scores. Key findings from this study include: 1. auto scores that rely on reference translations depend heavily on which reference is used; 2. referenceless COMET seems promising when it is used to evaluate translations of short passages (250-300 English words); and 3. evidence suggests good agreement between the ATA-Framework score and some auto scores within a middle range, but the relationship becomes non-monotonic beyond the middle range. This study is subject to the limitation of a small sample size and is a retrospective exploratory study not specifically designed to test a pre-defined hypothesis.- Anthology ID:
- 2022.amta-upg.13
- Volume:
- Proceedings of the 15th Biennial Conference of the Association for Machine Translation in the Americas (Volume 2: Users and Providers Track and Government Track)
- Month:
- September
- Year:
- 2022
- Address:
- Orlando, USA
- Editors:
- Janice Campbell, Stephen Larocca, Jay Marciano, Konstantin Savenkov, Alex Yanishevsky
- Venue:
- AMTA
- SIG:
- Publisher:
- Association for Machine Translation in the Americas
- Note:
- Pages:
- 181–201
- Language:
- URL:
- https://aclanthology.org/2022.amta-upg.13
- DOI:
- Cite (ACL):
- Evelyn Garland, Carola Berger, and Jon Ritzdorf. 2022. Comparison Between ATA Grading Framework Scores and Auto Scores. In Proceedings of the 15th Biennial Conference of the Association for Machine Translation in the Americas (Volume 2: Users and Providers Track and Government Track), pages 181–201, Orlando, USA. Association for Machine Translation in the Americas.
- Cite (Informal):
- Comparison Between ATA Grading Framework Scores and Auto Scores (Garland et al., AMTA 2022)