NAIST Offline Speech Translation System for IWSLT 2025

Ruhiyah Faradishi Widiaputri; Haotian Tan; Jan Meyer Saragih; Yuka Ko; Katsuhito Sudoh; Satoshi Nakamura; Sakriani Sakti

NAIST Offline Speech Translation System for IWSLT 2025

Ruhiyah Faradishi Widiaputri, Haotian Tan, Jan Meyer Saragih, Yuka Ko, Katsuhito Sudoh, Satoshi Nakamura, Sakriani Sakti

Abstract

This paper presents NAIST’s submission to the offline speech translation task of the IWSLT 2025 evaluation campaign, focusing on English-to-German and English-to-Chinese translation. We implemented both cascade and end-to-end frameworks using various components. For the cascade approach, we used Whisper and SALMONN as automatic speech recognition systems, each paired with Qwen2.5 large language model (LLM) for translation. In the end-to-end setting, we used SALMONN as speech translation and also built a custom model combining the Whisper encoder, DeCo projector, and Qwen2.5 LLM. To further leverage the large language model capabilities, we experimented with different prompting strategies. Additionally, since long speech inputs are segmented for processing, we applied hypothesis combination techniques to generate the final translation output. Our results show that combining Whisper and LLMs can yield strong translation performance, even without further fine-tuning in the cascade setup. Moreover, our proposed end-to-end architecture achieved competitive results, despite being trained on significantly less data compared to SALMONN. Finally, we decided to use both SALMONN as an end-to-end speech translation model and our proposed end-to-end model for our IWSLT 2025 submission for both language pairs.

Anthology ID:: 2025.iwslt-1.38
Volume:: Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025)
Month:: July
Year:: 2025
Address:: Vienna, Austria (in-person and online)
Editors:: Elizabeth Salesky, Marcello Federico, Antonis Anastasopoulos
Venues:: IWSLT | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 360–368
Language:
URL:: https://preview.aclanthology.org/landing_page/2025.iwslt-1.38/
DOI:
Bibkey:
Cite (ACL):: Ruhiyah Faradishi Widiaputri, Haotian Tan, Jan Meyer Saragih, Yuka Ko, Katsuhito Sudoh, Satoshi Nakamura, and Sakriani Sakti. 2025. NAIST Offline Speech Translation System for IWSLT 2025. In Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025), pages 360–368, Vienna, Austria (in-person and online). Association for Computational Linguistics.
Cite (Informal):: NAIST Offline Speech Translation System for IWSLT 2025 (Faradishi Widiaputri et al., IWSLT 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2025.iwslt-1.38.pdf

PDF Cite Search Fix data