LLM as Effective Streaming Processor: Bridging Streaming-Batch Mismatches with Group Position Encoding

Junlong Tong; Jinlan Fu; Zixuan Lin; Yingqi Fan; Anhao Zhao; Hui Su; Xiaoyu Shen

LLM as Effective Streaming Processor: Bridging Streaming-Batch Mismatches with Group Position Encoding

Junlong Tong, Jinlan Fu, Zixuan Lin, Yingqi Fan, Anhao Zhao, Hui Su, Xiaoyu Shen

Abstract

Large Language Models (LLMs) are primarily designed for batch processing. Existing methods for adapting LLMs to streaming rely either on expensive re-encoding or specialized architectures with limited scalability. This work identifies three key mismatches in adapting batch-oriented LLMs to streaming: (1) input-attention, (2) output-attention, and (3) position-ID mismatches. While it is commonly assumed that the latter two mismatches require frequent re-encoding, our analysis reveals that only the input-attention mismatch significantly impacts performance, indicating re-encoding outputs is largely unnecessary. To better understand this discrepancy with the common assumption,we provide the first comprehensive analysis of the impact of position encoding on LLMs in streaming, showing that preserving relative positions within source and target contexts is more critical than maintaining absolute order. Motivated by the above analysis, we introduce a group position encoding paradigm built on batch architectures to enhance consistency between streaming and batch modes. Extensive experiments on cross-lingual and cross-modal tasks demonstrate that our method outperforms existing approaches. Our method requires no architectural modifications, exhibits strong generalization in both streaming and batch modes. The code is available at repository https://github.com/EIT-NLP/StreamingLLM.

Anthology ID:: 2025.findings-acl.1207
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venues:: Findings | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 23497–23517
Language:
URL:: https://preview.aclanthology.org/ingestion-acl-25/2025.findings-acl.1207/
DOI:
Bibkey:
Cite (ACL):: Junlong Tong, Jinlan Fu, Zixuan Lin, Yingqi Fan, Anhao Zhao, Hui Su, and Xiaoyu Shen. 2025. LLM as Effective Streaming Processor: Bridging Streaming-Batch Mismatches with Group Position Encoding. In Findings of the Association for Computational Linguistics: ACL 2025, pages 23497–23517, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: LLM as Effective Streaming Processor: Bridging Streaming-Batch Mismatches with Group Position Encoding (Tong et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-acl-25/2025.findings-acl.1207.pdf

PDF Cite Search Fix data