LLM×MapReduce: Simplified Long-Sequence Processing using Large Language Models

Zihan Zhou; Chong Li; Xinyi Chen; Shuo Wang; Yu Chao; Zhili Li; Haoyu Wang; Qi Shi; Zhixing Tan; Xu Han; Xiaodong Shi (史晓东); Zhiyuan Liu; Maosong Sun (孙茂松)

LLM×MapReduce: Simplified Long-Sequence Processing using Large Language Models

Zihan Zhou, Chong Li, Xinyi Chen, Shuo Wang, Yu Chao, Zhili Li, Haoyu Wang, Qi Shi, Zhixing Tan, Xu Han, Xiaodong Shi, Zhiyuan Liu, Maosong Sun

Abstract

We propose a training-free framework that enables large language models (LLMs) to effectively process long texts, using a divide-and-conquer strategy for comprehensive document understanding.The proposed LLM×MapReduce framework splits the entire document into several chunks for LLMs to read and then aggregates the intermediate outputs to produce the final response. The main challenge for divide-and-conquer long text processing frameworks lies in the risk of losing essential long-range information due to document splitting, which can lead the model to produce incomplete or incorrect answers based on the segmented texts.Disrupted long-range information can be classified into two categories: inter-chunk dependency and inter-chunk conflict.We design a structured information protocol to better cope with inter-chunk dependency and an in-context confidence calibration mechanism to resolve inter-chunk conflicts. Experiments demonstrate that LLM×MapReduce outperforms representative open-source and commercial long-context LLMs and is compatible with several models.Our framework can also function as a data synthesis engine, capable of generating high-quality long-alignment data using only short-context LLMs.

Anthology ID:: 2025.acl-long.1341
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 27664–27678
Language:
URL:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1341/
DOI:
Bibkey:
Cite (ACL):: Zihan Zhou, Chong Li, Xinyi Chen, Shuo Wang, Yu Chao, Zhili Li, Haoyu Wang, Qi Shi, Zhixing Tan, Xu Han, Xiaodong Shi, Zhiyuan Liu, and Maosong Sun. 2025. LLM×MapReduce: Simplified Long-Sequence Processing using Large Language Models. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 27664–27678, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: LLM×MapReduce: Simplified Long-Sequence Processing using Large Language Models (Zhou et al., ACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.1341.pdf

PDF Cite Search Fix data