InfiniSST: Simultaneous Translation of Unbounded Speech with Large Language Model

Siqi Ouyang; Xi Xu; Lei Li

InfiniSST: Simultaneous Translation of Unbounded Speech with Large Language Model

Abstract

Simultaneous translation of unbounded streaming speech remains a challenging problem due to the need for effectively processing the historical speech context and past translations so that quality and latency, including computation overhead, can be balanced. Most prior works assume pre-segmented speech, limiting their real-world applicability. In this paper, we propose InfiniSST, a novel approach that formulates SST as a multi-turn dialogue task, enabling seamless translation of unbounded speech. We construct translation trajectories and robust segments from MuST-C with multi-latency augmentation during training and develop a key-value (KV) cache management strategy to facilitate efficient inference. Experiments on MuST-C En-Es, En-De, and En-Zh demonstrate that InfiniSST reduces computation-aware latency by 0.5 to 1 second while maintaining the same translation quality compared to baselines. Ablation studies further validate the contributions of our data construction and cache management strategy. Code is released at https://github.com/LeiLiLab/InfiniSST.

Anthology ID:: 2025.findings-acl.157
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venues:: Findings | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3032–3046
Language:
URL:: https://preview.aclanthology.org/ingestion-acl-25/2025.findings-acl.157/
DOI:
Bibkey:
Cite (ACL):: Siqi Ouyang, Xi Xu, and Lei Li. 2025. InfiniSST: Simultaneous Translation of Unbounded Speech with Large Language Model. In Findings of the Association for Computational Linguistics: ACL 2025, pages 3032–3046, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: InfiniSST: Simultaneous Translation of Unbounded Speech with Large Language Model (Ouyang et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-acl-25/2025.findings-acl.157.pdf

PDF Cite Search Fix data