Xuchen Wei
2025
HITSZ’s End-To-End Speech Translation Systems Combining Sequence-to-Sequence Auto Speech Recognition Model and Indic Large Language Model for IWSLT 2025 in Indic Track
Xuchen Wei
|
Yangxin Wu
|
Yaoyin Zhang
|
Henglyu Liu
|
Kehai Chen
|
Xuefeng Bai
|
Min Zhang
Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025)
This paper presents HITSZ’s submission for the IWSLT 2025 Indic track, focusing on speech-to-text translation (ST) for English-to-Indic and Indic-to-English language pairs. To enhance translation quality in this low-resource scenario, we propose an end-to-end system integrating the pre-trained Whisper automated speech recognition (ASR) model with Krutrim, an Indic-specialized large language model (LLM). Experimental results demonstrate that our end-to-end system achieved average BLEU scores of 28.88 for English-to-Indic directions and 27.86 for Indic-to-English directions. Furthermore, we investigated the Chain-of-Thought (CoT) method. While this method showed potential for significant translation quality improvements on successfully parsed outputs (e.g. a 13.84 BLEU increase for Tamil-to-English), we observed challenges in ensuring the model consistently adheres to the required CoT output format.
Search
Fix author
Co-authors
- Xuefeng Bai (白雪峰) 1
- Kehai Chen 1
- Henglyu Liu 1
- Yangxin Wu 1
- Yaoyin Zhang 1
- show all...