Yu Chao


2025

pdf bib
LLM×MapReduce: Simplified Long-Sequence Processing using Large Language Models
Zihan Zhou | Chong Li | Xinyi Chen | Shuo Wang | Yu Chao | Zhili Li | Haoyu Wang | Qi Shi | Zhixing Tan | Xu Han | Xiaodong Shi | Zhiyuan Liu | Maosong Sun
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We propose a training-free framework that enables large language models (LLMs) to effectively process long texts, using a divide-and-conquer strategy for comprehensive document understanding.The proposed LLM×MapReduce framework splits the entire document into several chunks for LLMs to read and then aggregates the intermediate outputs to produce the final response. The main challenge for divide-and-conquer long text processing frameworks lies in the risk of losing essential long-range information due to document splitting, which can lead the model to produce incomplete or incorrect answers based on the segmented texts.Disrupted long-range information can be classified into two categories: inter-chunk dependency and inter-chunk conflict.We design a structured information protocol to better cope with inter-chunk dependency and an in-context confidence calibration mechanism to resolve inter-chunk conflicts. Experiments demonstrate that LLM×MapReduce outperforms representative open-source and commercial long-context LLMs and is compatible with several models.Our framework can also function as a data synthesis engine, capable of generating high-quality long-alignment data using only short-context LLMs.