Yitian Huang
2026
Eliminating Out-of-Domain Recommendations in LLM-based Recommender Systems: A Unified View
Hao Liao | Jiwei Zhang | Jianxun Lian | Wensheng Lu | Mingqi Wu | Shuowangg | Yong Zhang | Yitian Huang | Mingyang Zhou | Rui Mao
Findings of the Association for Computational Linguistics: ACL 2026
Hao Liao | Jiwei Zhang | Jianxun Lian | Wensheng Lu | Mingqi Wu | Shuowangg | Yong Zhang | Yitian Huang | Mingyang Zhou | Rui Mao
Findings of the Association for Computational Linguistics: ACL 2026
Recommender systems based on Large Language Models (LLMs) are often plagued by hallucinations of out-of-domain (OOD) items. To address this, we propose RecLM, a unified framework that bridges the gap between retrieval and generation by instantiating three grounding paradigms under a single architecture: embedding-based retrieval, constrained generation over rewritten item titles, and discrete item-tokenizer generation. Using the same backbone LLM and prompts, we systematically compare these three views on public benchmarks. RecLM strictly eradicates OOD recommendations (OOD@10 = 0) across all variants, and the constrained generation variants RecLM-cgen and RecLM-token achieve overall state-of-the-art accuracy compared to both strong ID-based and LLM-based baselines. Our unified view provides a systematic basis for comparing three distinct paradigms to reduce item hallucinations, offering a practical framework to facilitate the application of LLMs to recommendation tasks. Source code is at https://github.com/microsoft/RecAI.
2025
Pretraining Context Compressor for Large Language Models with Embedding-Based Memory
Yuhong Dai | Jianxun Lian | Yitian Huang | Wei Zhang | Mingyang Zhou | Mingqi Wu | Xing Xie | Hao Liao
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yuhong Dai | Jianxun Lian | Yitian Huang | Wei Zhang | Mingyang Zhou | Mingqi Wu | Xing Xie | Hao Liao
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Efficient processing of long contexts in large language models (LLMs) is essential for real-world applications like retrieval-augmented generation and in-context learning, especially in resource-constrained environments such as edge computing. This paper explores the embedding-based context compression to reduce inference costs while preserving the downstream LLM configurations. We propose a decoupled compressor-LLM framework, pretrained on text reconstruction and completion tasks, designed to effectively preserve essential contextual information within condensed embedding representations. Our extensive experiments investigate pretraining, model configurations, compression rates, efficiency across tasks, and adaptability to various LLMs. Results demonstrate that our approach outperforms competitive baselines in three domains and across eight datasets while being adaptable to different downstream LLMs. We find that thorough pretraining and carefully selected compression rates, such as 4x and 16x, enable a lightweight compressor to achieve a good balance between accuracy and speed. These findings underscore the potential of embedding-based compression to enhance LLM efficiency and motivate further research in this area.