@inproceedings{haq-etal-2025-audio,
    title = "Audio-Based Crowd-Sourced Evaluation of Machine Translation Quality",
    author = "Haq, Sami  and
      Castilho, Sheila  and
      Graham, Yvette",
    editor = "Haddow, Barry  and
      Kocmi, Tom  and
      Koehn, Philipp  and
      Monz, Christof",
    booktitle = "Proceedings of the Tenth Conference on Machine Translation",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2025.wmt-1.3/",
    pages = "52--63",
    ISBN = "979-8-89176-341-8",
    abstract = "Machine Translation (MT) has achieved remarkable performance, with growing interest in speech translation and multimodal approaches. However, despite these advancements, MT quality assessment remains largely text-centric, typically relying on human experts who read and compare texts. Since many real-world MT applications (e.g., Google Translate Voice Mode, iFLYTEK Translator) involve translation being spoken rather printed or read, a more natural way to assess translation quality would be through speech as opposed text-only evaluations. This study compares text-only and audio-based evaluations of 10 MT systems from the WMT General MT Shared Task, using crowd-sourced judgments collected via Amazon Mechanical Turk. We additionally, performed statistical significance testing and self-replication experiments to test reliability and consistency of audio-based approach. Crowd-sourced assessments based on audio yield rankings largely consistent with text-only evaluations but, in some cases, identify significant differences between translation systems. We attribute this to speech{'}s richer, more natural modality and propose incorporating speech-based assessments into future MT evaluation frameworks."
}Markdown (Informal)
[Audio-Based Crowd-Sourced Evaluation of Machine Translation Quality](https://preview.aclanthology.org/ingest-emnlp/2025.wmt-1.3/) (Haq et al., WMT 2025)
ACL