Debo Cheng

2026

A prevalent approach to interpretable representation learning involves creating a mask that weights the significance of each input feature, followed by deriving a masked representation by applying this mask to the input representation. However, the identifiability of these learned masked representations is often uncertain, making the origin of these representations ambiguous or unreliable. Furthermore, the approaches to interpreting Transformer based on attention weights have been criticized for their faithfulness. To address these limitations, we propose a novel causal framework that directly learns identifiable and explainable representations from attention weights, rather than relying on importance masks. Our framework leverages identifiability theory and causal representation learning to extract explainable representations within a subspace of input representations, effectively transforming frozen representation learning methods into self-explaining systems. Experimental results on real-world datasets demonstrate that, compared to well-established state-of-the-art methods, our approach provides identifiable and more trustworthy explanations while guaranteeing faithfulness.

2025

pdf bib abs

Logit Space Constrained Fine-Tuning for Mitigating Hallucinations in LLM-Based Recommender Systems
Jianfeng Deng | Qingfeng Chen | Debo Cheng | Jiuyong Li | Lin Liu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Large language models (LLMs) have gained increasing attention in recommender systems, but their inherent hallucination issues significantly compromise the accuracy and reliability of recommendation results. Existing LLM-based recommender systems predominantly rely on standard fine-tuning methodologies, often ignoring hallucination issues during the fine-tuning process. To address this challenge, we propose Logit Space Constraints Fine-Tuning (LCFT), a novel fine-tuning framework designed to mitigate hallucination in LLM-based recommenders. Specifically, LCFT takes as input semantically positive and negative instruction pairs and incorporates Kullback–Leibler (KL) divergence into the training objective to explicitly maximise their distributional disparity in the logit space. By conducting such logit space-constrained fine-tuning, LCFT encourages more distinguishable and semantically grounded representations, thereby reducing the model’s susceptibility to hallucination. Extensive experiments on two recommendation models with distinct LLM backbones and four real-world datasets demonstrate that LCFT consistently reduces hallucination and enhances recommendation performance.

Co-authors

Yang Liu 1

Yinghao Zhang 1

Venues

EMNLP1
Findings1

Fix author