Disentangling the Effects of Unlearning in Measuring Parametric Faithfulness of Chain-of-Thought

Ryo Mitsuhashi; Gaku Morio; Ayana Niwa; Masahiro Kaneko; Kentaro Inui; Terufumi Morishita; Yuta Koreeda; Yasuhiro Sogawa

Disentangling the Effects of Unlearning in Measuring Parametric Faithfulness of Chain-of-Thought

Ryo Mitsuhashi, Gaku Morio, Ayana Niwa, Masahiro Kaneko, Kentaro Inui, Terufumi Morishita, Yuta Koreeda, Yasuhiro Sogawa

Abstract

Chain-of-Thought (CoT) in large language models (LLMs) has been widely debated in terms of whether it faithfully reflects an internal reasoning process of models. Parametric faithfulness is a recently proposed metric that uses unlearning to assess whether a model encodes parametric beliefs corresponding to a reasoning chain. This paper refines this metric by accounting for the unintended artifacts of unlearning. We introduce control tasks that unlearn irrelevant knowledge and word-shuffled content and show that these control tasks yield substantial parametric faithfulness values, suggesting the non-negligible effect of unlearning. We also found that control tasks help explain the significant variations in parametric faithfulness observed across different model sizes and CoT lengths. We conclude that the effects of unlearning need to be considered when measuring parametric faithfulness.

Anthology ID:: 2026.acl-srw.36
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Santosh T.Y.S.S., Juan Diego Rodriguez, Ona de Gibert
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 413–419
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-srw.36/
DOI:
Bibkey:
Cite (ACL):: Ryo Mitsuhashi, Gaku Morio, Ayana Niwa, Masahiro Kaneko, Kentaro Inui, Terufumi Morishita, Yuta Koreeda, and Yasuhiro Sogawa. 2026. Disentangling the Effects of Unlearning in Measuring Parametric Faithfulness of Chain-of-Thought. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026), pages 413–419, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Disentangling the Effects of Unlearning in Measuring Parametric Faithfulness of Chain-of-Thought (Mitsuhashi et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-srw.36.pdf

PDF Cite Search Fix data