KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality
Baochang Ren, Shuofei Qiao, Ningyu Zhang, Da Zheng, Huajun Chen
Abstract
Slow-thinking Large Language Models (LLMs) have demonstrated strong reasoning capabilities but often suffer from severe hallucinations due to an inability to recognize their knowledge boundaries. Existing Reinforcement Learning (RL) approaches typically rely on outcome-oriented rewards, which can inadvertently reinforce fabricated reasoning paths when the final answer is correct. To address this, we propose **Know**ledge-enhanced **RL**, **KnowRL**, a framework that integrates factual supervision directly into the reasoning process. By decomposing the chain of thought into atomic facts and verifying them against the corresponding ground-truth knowledge, KnowRL performs fine-grained checks to encourage models to reason faithfully. Crucially, this process-oriented supervision teaches the model to identify its knowledge boundaries, learning to say "I don’t know" instead of fabricating answers when information is missing. Experimental results demonstrate that KnowRL effectively mitigates hallucinations—reducing the Incorrect Rate on SimpleQA by 20.3% for distillation-based slow-thinking models while maintaining strong performance on complex reasoning benchmarks like GPQA and AIME 2025. Furthermore, our method shows robust transferability to out-of-distribution tasks, indicating that the model learns a generalizable verification behavior.- Anthology ID:
- 2026.acl-long.1840
- Volume:
- Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 39640–39658
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.1840/
- DOI:
- Cite (ACL):
- Baochang Ren, Shuofei Qiao, Ningyu Zhang, Da Zheng, and Huajun Chen. 2026. KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 39640–39658, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality (Ren et al., ACL 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.1840.pdf