Siyu Zhu
2026
Modeling Multi-Dimensional Cognitive States in Large Language Models under Cognitive Crowding
Lin Zhong | Siyu Zhu | Zizhen Yuan | Jinhao Cui | Xinyang Zhao | Lingzhi Wang | Hao Chen | Qing Liao
Findings of the Association for Computational Linguistics: ACL 2026
Lin Zhong | Siyu Zhu | Zizhen Yuan | Jinhao Cui | Xinyang Zhao | Lingzhi Wang | Hao Chen | Qing Liao
Findings of the Association for Computational Linguistics: ACL 2026
Modeling human cognitive states is essential for advanced artificial intelligence. Existing Large Language Models (LLMs) mainly address isolated tasks such as emotion analysis or stance detection, and fail to capture interactions among cognitive dimensions defined in psychology, including emotion, thinking style, stance, and intention. To bridge this gap, we construct CognitiveBench, the first benchmark with unified annotations across the above four dimensions. Experiments on CognitiveBench show that although LLMs perform well on single dimension tasks, their performance drops sharply in joint multi-dimensional modeling. Using Gromov-hyperbolicity analysis, we find that CognitiveBench exhibits a strong hierarchical structure. We attribute the performance bottleneck to “Cognitive Crowding”, where hierarchical cognitive states require exponential representational space, while the Euclidean space of LLMs grows only polynomially, causing representation overlap and degraded performance. To address this mismatch, we propose HyCoLLM, which models cognitive states in hyperbolic space and aligns LLM representations via Hyperbolic Guided Alignment Tuning. Results show that HyCoLLM substantially improves multi-dimensional cognitive understanding, allowing 8B parameter model to outperform strong baselines, including GPT-4o. Our code is available at https://anonymous.4open.science/r/HycoLLM.
2025
Scaling Down, Serving Fast: Compressing and Deploying Efficient LLMs for Recommendation Systems
Kayhan Behdin | Ata Fatahibaarzi | Qingquan Song | Yun Dai | Aman Gupta | Zhipeng Wang | Hejian Sang | Shao Tang | Gregory Dexter | Sirou Zhu | Siyu Zhu | Tejas Dharamsi | Vignesh Kothapalli | Zhoutong Fu | Yihan Cao | Pin-Lun Hsu | Fedor Borisyuk | Natesh S. Pillai | Luke Simon | Rahul Mazumder
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Kayhan Behdin | Ata Fatahibaarzi | Qingquan Song | Yun Dai | Aman Gupta | Zhipeng Wang | Hejian Sang | Shao Tang | Gregory Dexter | Sirou Zhu | Siyu Zhu | Tejas Dharamsi | Vignesh Kothapalli | Zhoutong Fu | Yihan Cao | Pin-Lun Hsu | Fedor Borisyuk | Natesh S. Pillai | Luke Simon | Rahul Mazumder
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Large language models (LLMs) have demonstrated remarkable performance across a wide range of industrial applications, from search and recommendation systems to generative tasks. Although scaling laws indicate that larger models generally yield better generalization and performance, their substantial computational requirements often render them impractical for many real-world scenarios at scale. In this paper, we present a comprehensive set of insights for training and deploying small language models (SLMs) that deliver high performance for a variety of industry use cases. We focus on two key techniques: (1) knowledge distillation and (2) model compression via structured pruning and quantization. These approaches enable SLMs to retain much of the quality of their larger counterparts while significantly reducing training/serving costs and latency. We detail the impact of these techniques on a variety of use cases in a large professional social network platform and share deployment lessons, including hardware optimization strategies that improve speed and throughput for both predictive and reasoning-based applications in Recommendation Systems.