Yu Chao
2025
LLM×MapReduce: Simplified Long-Sequence Processing using Large Language Models
Zihan Zhou
|
Chong Li
|
Xinyi Chen
|
Shuo Wang
|
Yu Chao
|
Zhili Li
|
Haoyu Wang
|
Qi Shi
|
Zhixing Tan
|
Xu Han
|
Xiaodong Shi
|
Zhiyuan Liu
|
Maosong Sun
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
We propose a training-free framework that enables large language models (LLMs) to effectively process long texts, using a divide-and-conquer strategy for comprehensive document understanding.The proposed LLM×MapReduce framework splits the entire document into several chunks for LLMs to read and then aggregates the intermediate outputs to produce the final response. The main challenge for divide-and-conquer long text processing frameworks lies in the risk of losing essential long-range information due to document splitting, which can lead the model to produce incomplete or incorrect answers based on the segmented texts.Disrupted long-range information can be classified into two categories: inter-chunk dependency and inter-chunk conflict.We design a structured information protocol to better cope with inter-chunk dependency and an in-context confidence calibration mechanism to resolve inter-chunk conflicts. Experiments demonstrate that LLM×MapReduce outperforms representative open-source and commercial long-context LLMs and is compatible with several models.Our framework can also function as a data synthesis engine, capable of generating high-quality long-alignment data using only short-context LLMs.
Search
Fix author
Co-authors
- Xinyi Chen 1
- Xu Han 1
- Chong Li 1
- Zhili Li 1
- Zhiyuan Liu 1
- show all...
Venues
- acl1