Liangbin Huang
2025
Fine-grained Video Dubbing Duration Alignment with Segment Supervised Preference Optimization
Chaoqun Cui
|
Liangbin Huang
|
Shijing Wang
|
Zhe Tong
|
Zhaolong Huang
|
Xiao Zeng
|
Xiaofeng Liu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Video dubbing aims to translate original speech in visual media programs from the source language to the target language, relying on neural machine translation and text-to-speech technologies. Due to varying information densities across languages, target speech often mismatches the source speech duration, causing audio-video synchronization issues that significantly impact viewer experience. In this study, we approach duration alignment in LLM-based video dubbing machine translation as a preference optimization problem. We propose the Segment Supervised Preference Optimization (SSPO) method, which employs a segment-wise sampling strategy and fine-grained loss to mitigate duration mismatches between source and target lines. Experimental results demonstrate that SSPO achieves superior performance in duration alignment tasks.
Search
Fix author
Co-authors
- Chaoqun Cui 1
- Zhaolong Huang 1
- Xiaofeng Liu 1
- Zhe Tong 1
- Shijing Wang 1
- show all...
Venues
- acl1