KIT’s Offline Speech Translation and Instruction Following Submission for IWSLT 2025

Sai Koneru, Maike Züfle, Thai Binh Nguyen, Seymanur Akti, Jan Niehues, Alexander Waibel


Abstract
In this paper, we present the submissions for the Offline ST and Instruction Following (IF) tracks, where we leverage LLMs to enhance performance across all tasks. For the Offline ST track, we propose a pipeline that employs multiple automatic speech recognition systems, whose outputs are fused using an LLM with document-level context. This is followed by a two-step translation process, incorporating additional contextual refinement step to improve translation quality. For the IF track, we develop an end-to-end model that integrates a speech encoder with an LLM to perform a wide range of instruction-following tasks. We complement it with a final document-level refinement stage to further enhance output quality by using contextual information.
Anthology ID:
2025.iwslt-1.22
Volume:
Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025)
Month:
July
Year:
2025
Address:
Vienna, Austria (in-person and online)
Editors:
Elizabeth Salesky, Marcello Federico, Antonis Anastasopoulos
Venues:
IWSLT | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
232–244
Language:
URL:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.iwslt-1.22/
DOI:
Bibkey:
Cite (ACL):
Sai Koneru, Maike Züfle, Thai Binh Nguyen, Seymanur Akti, Jan Niehues, and Alexander Waibel. 2025. KIT’s Offline Speech Translation and Instruction Following Submission for IWSLT 2025. In Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025), pages 232–244, Vienna, Austria (in-person and online). Association for Computational Linguistics.
Cite (Informal):
KIT’s Offline Speech Translation and Instruction Following Submission for IWSLT 2025 (Koneru et al., IWSLT 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.iwslt-1.22.pdf