Abstract
Building on our GPT-4 LQA research in MT, this study identifies top LLMs for an LQA pipeline with up to three models. LLMs like GPT-4, GPT-4o, GPT-4 Turbo, Google Vertex, Anthropic’s Claude 3, and Llama-3 are prompted using MQM error typology. These models generate segment-wise outputs describing translation errors, scored by severity and DQF-MQM penalties. The study evaluates four language pairs: English-Spanish, English-Chinese, English-German, and English-Portuguese, using datasets from our 2024 State of MT Report across eight domains. LLM outputs are correlated with human judgments, ranking models by alignment with human assessments for penalty score, issue presence, type, and severity. This research proposes an LQA pipeline with up to three models, weighted by output quality, highlighting LLMs’ potential to enhance MT review processes and improve translation quality.- Anthology ID:
- 2024.amta-presentations.12
- Volume:
- Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 2: Presentations)
- Month:
- September
- Year:
- 2024
- Address:
- Chicago, USA
- Editors:
- Marianna Martindale, Janice Campbell, Konstantin Savenkov, Shivali Goel
- Venue:
- AMTA
- SIG:
- Publisher:
- Association for Machine Translation in the Americas
- Note:
- Pages:
- 154–183
- Language:
- URL:
- https://preview.aclanthology.org/add_missing_videos/2024.amta-presentations.12/
- DOI:
- Cite (ACL):
- Daria Sinitsyna and Konstantin Savenkov. 2024. Comparative Evaluation of Large Language Models for Linguistic Quality Assessment in Machine Translation. In Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 2: Presentations), pages 154–183, Chicago, USA. Association for Machine Translation in the Americas.
- Cite (Informal):
- Comparative Evaluation of Large Language Models for Linguistic Quality Assessment in Machine Translation (Sinitsyna & Savenkov, AMTA 2024)
- PDF:
- https://preview.aclanthology.org/add_missing_videos/2024.amta-presentations.12.pdf