Pause-Aware Automatic Dubbing using LLM and Voice Cloning

Yuang Li, Jiaxin Guo, Min Zhang, Ma Miaomiao, Zhiqiang Rao, Weidong Zhang, Xianghui He, Daimeng Wei, Hao Yang


Abstract
Automatic dubbing aims to translate the speech of a video into another language, ensuring the new speech naturally fits the original video. This paper details Huawei Translation Services Center’s (HW-TSC) submission for IWSLT 2024’s automatic dubbing task, under an unconstrained setting. Our system’s machine translation (MT) component utilizes a Transformer-based MT model and an LLM-based post-editor to produce translations of varying lengths. The text-to-speech (TTS) component employs a VITS-based TTS model and a voice cloning module to emulate the original speaker’s vocal timbre. For enhanced dubbing synchrony, we introduce a parsing-informed pause selector. Finally, we rerank multiple results based on lip-sync error distance (LSE-D) and character error rate (CER). Our system achieves LSE-D of 10.75 and 12.19 on subset1 and subset2 of DE-EN test sets respectively, superior to last year’s best system.
Anthology ID:
2024.iwslt-1.2
Volume:
Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024)
Month:
August
Year:
2024
Address:
Bangkok, Thailand (in-person and online)
Editors:
Elizabeth Salesky, Marcello Federico, Marine Carpuat
Venue:
IWSLT
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12–16
Language:
URL:
https://aclanthology.org/2024.iwslt-1.2
DOI:
Bibkey:
Cite (ACL):
Yuang Li, Jiaxin Guo, Min Zhang, Ma Miaomiao, Zhiqiang Rao, Weidong Zhang, Xianghui He, Daimeng Wei, and Hao Yang. 2024. Pause-Aware Automatic Dubbing using LLM and Voice Cloning. In Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024), pages 12–16, Bangkok, Thailand (in-person and online). Association for Computational Linguistics.
Cite (Informal):
Pause-Aware Automatic Dubbing using LLM and Voice Cloning (Li et al., IWSLT 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2024.iwslt-1.2.pdf