Luke Simon
2026
ReasonRec: A Reasoning-Augmented Multimodal Agent for Unified Recommendation
Yihua Zhang | Mingfu Liang | Jiyan Yang | Rong Jin | Wen-Yen Chen | Yiping Han | Huayu Li | Buyun Zhang | Liang Luo | Luke Simon | Sijia Liu | Tianlong Chen | Xi Liu
Findings of the Association for Computational Linguistics: ACL 2026
Yihua Zhang | Mingfu Liang | Jiyan Yang | Rong Jin | Wen-Yen Chen | Yiping Han | Huayu Li | Buyun Zhang | Liang Luo | Luke Simon | Sijia Liu | Tianlong Chen | Xi Liu
Findings of the Association for Computational Linguistics: ACL 2026
Recent advances in multimodal recommenders excel at feature fusion but remain opaque and inefficient decision-makers, lacking explicit reasoning and self-awareness of uncertainty. To address this, we introduce ReasonRec, a reasoning-augmented multimodal agent structured around a three-stage explicit reasoning pipeline: Observe, via a pretrained Vision-Language Model (VLM) encoder; Deliberate, by formulating recommendation as chain-of-thought (CoT) reasoning tasks and explicitly quantifying prediction uncertainty through an evidence-horizon-aware curriculum; and Act, through dynamic delegation of uncertain or challenging queries to lightweight classical recommendation models. Specifically, we propose a reasoning-aware visual instruction tuning strategy that systematically transforms diverse recommendation tasks into unified CoT prompts, enabling the VLM to explicitly articulate intermediate decision steps. Additionally, our evidence-horizon curriculum progressively enhances the reasoning complexity to better handle cold-start and long-tail user scenarios, significantly boosting model generalization. Furthermore, the uncertainty-guided delegation mechanism empowers the agent to assess its own confidence, strategically allocating computational resources to optimize both recommendation accuracy and inference efficiency. Comprehensive experiments on four standard recommendation tasks (sequential recommendation, direct recommendation, CTR prediction, and explanation generation) across five real-world datasets demonstrate that ReasonRec achieves over 30% relative improvement in key ranking metrics (e.g., HR@5, NDCG@5) compared to state-of-the-art multimodal recommenders. Crucially, ReasonRec substantially reduces inference latency by dynamically delegating up to 35% of queries to efficient sub-models without compromising accuracy. Extensive ablation studies further confirm that each proposed reasoning and planning mechanism individually contributes substantially to ReasonRec’s overall effectiveness. Collectively, our results illustrate a clear pathway towards interpretable, adaptive, and efficient multimodal recommendation through explicit reasoning and agentic design.
2025
Scaling Down, Serving Fast: Compressing and Deploying Efficient LLMs for Recommendation Systems
Kayhan Behdin | Ata Fatahibaarzi | Qingquan Song | Yun Dai | Aman Gupta | Zhipeng Wang | Hejian Sang | Shao Tang | Gregory Dexter | Sirou Zhu | Siyu Zhu | Tejas Dharamsi | Vignesh Kothapalli | Zhoutong Fu | Yihan Cao | Pin-Lun Hsu | Fedor Borisyuk | Natesh S. Pillai | Luke Simon | Rahul Mazumder
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Kayhan Behdin | Ata Fatahibaarzi | Qingquan Song | Yun Dai | Aman Gupta | Zhipeng Wang | Hejian Sang | Shao Tang | Gregory Dexter | Sirou Zhu | Siyu Zhu | Tejas Dharamsi | Vignesh Kothapalli | Zhoutong Fu | Yihan Cao | Pin-Lun Hsu | Fedor Borisyuk | Natesh S. Pillai | Luke Simon | Rahul Mazumder
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Large language models (LLMs) have demonstrated remarkable performance across a wide range of industrial applications, from search and recommendation systems to generative tasks. Although scaling laws indicate that larger models generally yield better generalization and performance, their substantial computational requirements often render them impractical for many real-world scenarios at scale. In this paper, we present a comprehensive set of insights for training and deploying small language models (SLMs) that deliver high performance for a variety of industry use cases. We focus on two key techniques: (1) knowledge distillation and (2) model compression via structured pruning and quantization. These approaches enable SLMs to retain much of the quality of their larger counterparts while significantly reducing training/serving costs and latency. We detail the impact of these techniques on a variety of use cases in a large professional social network platform and share deployment lessons, including hardware optimization strategies that improve speed and throughput for both predictive and reasoning-based applications in Recommendation Systems.
Search
Fix author
Co-authors
- Kayhan Behdin 1
- Fedor Borisyuk 1
- Yihan Cao 1
- Wen-Yen Chen 1
- Tianlong Chen 1
- Yun Dai 1
- Gregory Dexter 1
- Tejas Dharamsi 1
- Ata Fatahibaarzi 1
- Zhoutong Fu 1
- Aman Gupta 1
- Yiping Han 1
- Pin-Lun Hsu 1
- Rong Jin 1
- Vignesh Kothapalli 1
- Huayu Li 1
- Mingfu Liang 1
- Sijia Liu 1
- Xi Liu 1
- Liang Luo 1
- Rahul Mazumder 1
- Natesh S. Pillai 1
- Hejian Sang 1
- Qingquan Song 1
- Shao Tang 1
- Zhipeng Wang 1
- Jiyan Yang 1
- Yihua Zhang 1
- Buyun Zhang 1
- Sirou Zhu 1
- Siyu Zhu 1