Generative Error Correction for Emotion-aware Speech-to-text Translation

Zhengdong Yang, Sheng Li, Chenhui Chu


Abstract
This paper explores emotion-aware speech-to-text translation (ST) using generative error correction (GER) by large language models (LLMs). Despite recent advancements in ST, the impact of the emotional content has been overlooked. First, we enhance the translation of emotional speech by adopting the GER paradigm: Finetuned an LLM to generate the translation based on the decoded N-best hypotheses. Moreover, we combine the emotion and sentiment labels into the LLM finetuning process to enable the model to consider the emotion content. In addition, we project the ST model’s latent representation into the LLM embedding space to further improve emotion recognition and translation. Experiments on an English-Chinese dataset show the effectiveness of the combination of GER, emotion and sentiment labels, and the projector for emotion-aware ST. Our code is available at https://github.com/N-Orien/EmoST.
Anthology ID:
2025.findings-acl.1047
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
20413–20421
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.findings-acl.1047/
DOI:
Bibkey:
Cite (ACL):
Zhengdong Yang, Sheng Li, and Chenhui Chu. 2025. Generative Error Correction for Emotion-aware Speech-to-text Translation. In Findings of the Association for Computational Linguistics: ACL 2025, pages 20413–20421, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Generative Error Correction for Emotion-aware Speech-to-text Translation (Yang et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.findings-acl.1047.pdf