Hao Li
Other people with similar names: Hao Li, Hao Li, Hao Li, Hao Li
Unverified author pages with similar names: Hao Li
2026
MARCH: Multi-Agent Reinforced Check for Hallucination
Zhuo Li | Yupeng Zhang | Pengyu Cheng | Jiajun Song | Mengyu Zhou | Hao Li | Shujie Hu | Yu Qin | Erchao.zec | Xiaoxi Jiang | Guanjunjiang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhuo Li | Yupeng Zhang | Pengyu Cheng | Jiajun Song | Mengyu Zhou | Hao Li | Shujie Hu | Yu Qin | Erchao.zec | Xiaoxi Jiang | Guanjunjiang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Hallucination remains a critical bottleneck for large language models (LLMs), undermining their reliability in real-world applications, especially in Retrieval-Augmented Generation (RAG) systems. While existing hallucination detection methods employ LLM-as-a-judge to verify LLM outputs against retrieved evidence, they suffer from inherent *confirmation bias*, where the verifier inadvertently reproduces the errors of the original generation. To address this, we introduce **M**ulti-**A**gent **R**einforced self-**C**heck for **H**allucination (MARCH), a framework that enforces rigorous factual alignment by leveraging deliberate *information asymmetry*. MARCH orchestrates a collaborative pipeline of three specialized agents: a Solver, a Proposer, and a Checker. The Solver generates an initial RAG response, which the Proposer decomposes into claim-level verifiable atomic propositions. Crucially, the Checker validates these propositions against retrieved evidence in isolation, deprived of the Solver’s original output. This well-crafted information asymmetry scheme breaks the cycle of self-confirmation bias. By training this pipeline with multi-agent reinforcement learning (MARL), we enable the agents to co-evolve and optimize factual adherence. Extensive experiments across hallucination benchmarks demonstrate that MARCH substantially reduces hallucination rates. Notably, an 8B-parameter LLM equipped with MARCH achieves performance competitive with powerful closed-source models. MARCH paves a scalable path for factual self-improvement of LLMs through co-evolution. The code is at https://github.com/Qwen-Applications/MARCH.