Kunhui Lee
2026
EAIR: Entity-aware Inference-Time Knowledge Routing for Multi-Hop Knowledge Editing
Jungyu Lee | Kunhui Lee | Gyun Lee | Seung-Hoon Na
Findings of the Association for Computational Linguistics: ACL 2026
Jungyu Lee | Kunhui Lee | Gyun Lee | Seung-Hoon Na
Findings of the Association for Computational Linguistics: ACL 2026
Existing in-context editing (ICE) methods for multi-hop knowledge editing commonly suffer from paraphrase sensitivity, which refers to the phenomenon where these methods are not sufficiently robust to paraphrased multi-hop questions. To improve retrieval accuracy and knowledge routing to address paraphrase sensitivity, this paper proposes a novel entity-aware inference-time knowledge routing method, referred to as EAIR, which consists of four major steps: 1) Entity-referential query decomposition, which decomposes the original question into multiple entity-referential sub-question instructions; 2) Entity-aware retrieval, which leverages the previously reference-resolved topic entity in the retrieval step; 3) Evidence-conditioned contrastive decoding, which discourages the model from relying on its parametric knowledge and steers the model toward following retrieved edits; 4) Reflection-based knowledge routing, which additionally filters decoding results using refusal-style reflection to mitigate the risk introduced by contrastive decoding. Experimental results across the MQuAKE benchmark family and model scales show that EAIR achieves the highest strict case accuracy in 11 of 12 settings, substantially reducing paraphrase sensitivity.
PURE: Post-hoc Unlocking and REfinement for Discrete Diffusion Decoding
Yangryeol Park | Kunhui Lee | Hanback Choi | Cheoneum Park | Donghyeon Jeon | Inho Kang | Seung-Hoon Na
Findings of the Association for Computational Linguistics: ACL 2026
Yangryeol Park | Kunhui Lee | Hanback Choi | Cheoneum Park | Donghyeon Jeon | Inho Kang | Seung-Hoon Na
Findings of the Association for Computational Linguistics: ACL 2026
Masked diffusion language models (MDLMs) enable efficient parallel decoding but are limited by a monotonic unmasking policy, where committed tokens cannot be revised. While remasking-based methods mitigate early errors, they mainly intervene during generation. In this work, we study post-hoc refinement of a completed draft and find that naive correction often fails because of contextual lock-in, a phenomenon in which local error patterns become self-reinforcing. To address this, we propose PURE (Post-hoc Unlocking and REfinement), a training-free inference algorithm for two-phase decoding. PURE profiles confidence dynamics during drafting to identify unstable regions via an instability score (𝛥i), then unlocks them through deterministic window masking and stochastic leftward relaxation. On reasoning benchmarks, PURE substantially improves accuracy when applied to LLaDA-8B-Instruct, including a gain of +12.9 points over the baseline on GSM8K. These gains require only a small refinement budget, yielding a favorable compute-quality trade-off for discrete diffusion decoding.