Seymanur Akti
2025
KIT’s Offline Speech Translation and Instruction Following Submission for IWSLT 2025
Sai Koneru
|
Maike Züfle
|
Thai Binh Nguyen
|
Seymanur Akti
|
Jan Niehues
|
Alexander Waibel
Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025)
In this paper, we present the submissions for the Offline ST and Instruction Following (IF) tracks, where we leverage LLMs to enhance performance across all tasks. For the Offline ST track, we propose a pipeline that employs multiple automatic speech recognition systems, whose outputs are fused using an LLM with document-level context. This is followed by a two-step translation process, incorporating additional contextual refinement step to improve translation quality. For the IF track, we develop an end-to-end model that integrates a speech encoder with an LLM to perform a wide range of instruction-following tasks. We complement it with a final document-level refinement stage to further enhance output quality by using contextual information.