Are Large Language Models State-of-the-art Quality Estimators for Machine Translation of User-generated Content?

Shenbin Qian; Constantin Orasan; Diptesh Kanojia; Félix Do Carmo

doi:10.18653/v1/2024.wat-1.4

Are Large Language Models State-of-the-art Quality Estimators for Machine Translation of User-generated Content?

Shenbin Qian, Constantin Orasan, Diptesh Kanojia, Félix Do Carmo

Abstract

This paper investigates whether large language models (LLMs) are state-of-the-art quality estimators for machine translation of user-generated content (UGC) that contains emotional expressions, without the use of reference translations. To achieve this, we employ an existing emotion-related dataset with human-annotated errors and calculate quality evaluation scores based on the Multi-dimensional Quality Metrics. We compare the accuracy of several LLMs with that of our fine-tuned baseline models, under in-context learning and parameter-efficient fine-tuning (PEFT) scenarios. We find that PEFT of LLMs leads to better performance in score prediction with human interpretable explanations than fine-tuned models. However, a manual analysis of LLM outputs reveals that they still have problems such as refusal to reply to a prompt and unstable output while evaluating machine translation of UGC.

Anthology ID:: 2024.wat-1.4
Volume:: Proceedings of the Eleventh Workshop on Asian Translation (WAT 2024)
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Toshiaki Nakazawa, Isao Goto
Venue:: WAT
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 45–55
Language:
URL:: https://preview.aclanthology.org/ingest_wac_2008/2024.wat-1.4/
DOI:: 10.18653/v1/2024.wat-1.4
Bibkey:
Cite (ACL):: Shenbin Qian, Constantin Orasan, Diptesh Kanojia, and Félix Do Carmo. 2024. Are Large Language Models State-of-the-art Quality Estimators for Machine Translation of User-generated Content?. In Proceedings of the Eleventh Workshop on Asian Translation (WAT 2024), pages 45–55, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: Are Large Language Models State-of-the-art Quality Estimators for Machine Translation of User-generated Content? (Qian et al., WAT 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest_wac_2008/2024.wat-1.4.pdf
Supplementarymaterial:: 2024.wat-1.4.SupplementaryMaterial.txt

PDF Cite Search Supplementarymaterial Fix data