HiChunk: Evaluating and Enhancing Retrieval Augmented Generation with Hierarchical Chunking

Wensheng Lu, Keyu Chen, Zhifeng Shen, Ruizhi Qiao, Xing Sun


Abstract
Retrieval-Augmented Generation (RAG) enhances the response capabilities of language models by integrating external knowledge sources. However, document chunking as an important part of RAG system often lacks effective evaluation tools. This paper first analyzes why existing RAG evaluation benchmarks are inadequate for assessing document chunking quality, specifically due to evidence sparsity. Based on this conclusion, we propose HiCBench, which includes manually annotated multi-level document chunking points, synthesized evidence-dense question answer(QA) pairs, and their corresponding evidence sources. We also propose HiChunk, a hierarchical document structuring framework using fine-tuned LLMs and the Auto-Merge retrieval algorithm to enhance retrieval quality. Experiments demonstrate that HiCBench effectively evaluates the impact of different chunking methods across the entire RAG pipeline. Moreover, HiChunk achieves better chunking quality within reasonable time consumption, thereby enhancing the overall performance of RAG systems. Source code is available at https://github.com/TencentCloudADP/hichunk.
Anthology ID:
2026.acl-long.1372
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
29738–29753
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1372/
DOI:
Bibkey:
Cite (ACL):
Wensheng Lu, Keyu Chen, Zhifeng Shen, Ruizhi Qiao, and Xing Sun. 2026. HiChunk: Evaluating and Enhancing Retrieval Augmented Generation with Hierarchical Chunking. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 29738–29753, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
HiChunk: Evaluating and Enhancing Retrieval Augmented Generation with Hierarchical Chunking (Lu et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.1372.pdf
Checklist:
 2026.acl-long.1372.checklist.pdf