Rectifying Belief Space via Unlearning to Harness LLMs’ Reasoning

Ayana Niwa, Masahiro Kaneko, Kentaro Inui


Abstract
Large Language Models (LLMs) exhibit sophisticated reasoning yet still generate incorrect answers. We attribute these errors to **Spurious Beliefs**, defined as propositions the model internally considers as true despite being factually false. To reduce reasoning errors, we propose a belief space rectification framework. Our method first identifies the beliefs invoked during inference via an explanation‐based approach with Forward‐Backward Beam Search (FBBS). We subsequently apply unlearning via gradient ascent to suppress spurious beliefs and enhance true ones, thereby effectively rectifying the model’s belief space. Experiments on three QA datasets and three LLMs show that our method significantly reduces erroneous reasoning and improves generalization.
Anthology ID:
2025.findings-acl.1285
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
25060–25075
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.findings-acl.1285/
DOI:
Bibkey:
Cite (ACL):
Ayana Niwa, Masahiro Kaneko, and Kentaro Inui. 2025. Rectifying Belief Space via Unlearning to Harness LLMs’ Reasoning. In Findings of the Association for Computational Linguistics: ACL 2025, pages 25060–25075, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Rectifying Belief Space via Unlearning to Harness LLMs’ Reasoning (Niwa et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.findings-acl.1285.pdf