Shiyao Wang
2026
Unleashing the Native Recommendation Potential: LLM-Based Generative Recommendation via Structured Term Identifiers
Zhiyang Zhang | Junda She | Kuo Cai | Bo Chen | Shiyao Wang | Xinchen Luo | Qiang Luo | Ruiming Tang | Han Li | Kun Gai | Guorui Zhou
Findings of the Association for Computational Linguistics: ACL 2026
Zhiyang Zhang | Junda She | Kuo Cai | Bo Chen | Shiyao Wang | Xinchen Luo | Qiang Luo | Ruiming Tang | Han Li | Kun Gai | Guorui Zhou
Findings of the Association for Computational Linguistics: ACL 2026
Leveraging the vast open-world knowledge and understanding capabilities of Large Language Models (LLMs) to develop general-purpose, semantically-aware recommender systems has emerged as a pivotal research direction in generative recommendation. However, existing methods face bottlenecks in constructing item identifiers. Text-based methods introduce LLMs’ vast output space, leading to hallucination, while methods based on Semantic IDs (SIDs) encounter a semantic gap between SIDs and LLMs’ native vocabulary, requiring costly vocabulary expansion and alignment training. To address this, this paper introduces Term IDs (TIDs), defined as a set of semantically rich and standardized textual keywords, to serve as robust item identifiers. We propose GRAM, a novel framework centered on TIDs, employs Context-aware Term Generation to convert item’s metadata into standardized TIDs and utilizes Integrative Instruction Fine-tuning to collaboratively optimize term internalization and sequential recommendation. Additionally, Elastic Identifier Grounding is designed for robust item mapping. Extensive experiments on real-world datasets demonstrate that GRAM significantly outperforms baselines across multiple scenarios, pointing a promising direction for generalizable and high-performance generative recommendation systems.
OneRec-Think: In-Text Reasoning for Generative Recommendation
Zhanyu Liu | Shiyao Wang | Xingmei Wang | Rongzhou Zhang | Jiaxin Deng | Honghui Bao | Jinghao Zhang | Wuchao Li | PengFei Zheng | Xiangyu Wu | Yifei Hu | Qigen Hu | Xinchen Luo | Lejian Ren | Zhang Zixing | Qianqian Wang | Kuo Cai | Yunfan Wu | Hongtao Cheng | Zexuan Cheng | Lu Ren | Huanjie Wang | Yi Su | Ruiming Tang | Kun Gai | Guorui Zhou
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhanyu Liu | Shiyao Wang | Xingmei Wang | Rongzhou Zhang | Jiaxin Deng | Honghui Bao | Jinghao Zhang | Wuchao Li | PengFei Zheng | Xiangyu Wu | Yifei Hu | Qigen Hu | Xinchen Luo | Lejian Ren | Zhang Zixing | Qianqian Wang | Kuo Cai | Yunfan Wu | Hongtao Cheng | Zexuan Cheng | Lu Ren | Huanjie Wang | Yi Su | Ruiming Tang | Kun Gai | Guorui Zhou
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The powerful generative capacity of Large Language Models (LLMs) has instigated a paradigm shift in recommendation. However, existing generative models (e.g., OneRec) operate as implicit predictors, critically lacking the capacity for explicit and controllable reasoning—a key advantage of LLMs. To bridge this gap, we propose OneRec-Think, a unified framework that seamlessly integrates dialogue, reasoning, and personalized recommendation. OneRec-Think incorporates: (1) Itemic Alignment: cross-modal Item-Textual Alignment for semantic grounding; (2) Reasoning Activation: Reasoning Scaffolding to activate LLM reasoning within the recommendation context; and (3) Reasoning Enhancement, where we design a recommendation-specific reward function that accounts for the multi-validity nature of user preferences. Experiments across public benchmarks show state-of-the-art performance. Moreover, our proposed "Think-Ahead" architecture enables effective industrial deployment, achieving a 0.159% gain in APP Stay Time and validating the practical efficacy of the model’s explicit reasoning capability.
2025
ChildMandarin: A Comprehensive Mandarin Speech Dataset for Young Children Aged 3-5
Jiaming Zhou | Shiyao Wang | Shiwan Zhao | Jiabei He | Haoqin Sun | Hui Wang | Cheng Liu | Aobo Kong | Yujie Guo | Xi Yang | Yequan Wang | Yonghua Lin | Yong Qin
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jiaming Zhou | Shiyao Wang | Shiwan Zhao | Jiabei He | Haoqin Sun | Hui Wang | Cheng Liu | Aobo Kong | Yujie Guo | Xi Yang | Yequan Wang | Yonghua Lin | Yong Qin
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Automatic speech recognition (ASR) systems have advanced significantly with models like Whisper, Conformer, and self-supervised frameworks such as Wav2vec 2.0 and HuBERT. However, developing robust ASR models for young children’s speech remains challenging due to differences in pronunciation, tone, and pace compared to adult speech. In this paper, we introduce a new Mandarin speech dataset focused on children aged 3 to 5, addressing the scarcity of resources in this area. The dataset comprises 41.25 hours of speech with carefully crafted manual transcriptions, collected from 397 speakers across various provinces in China, with balanced gender representation. We provide a comprehensive analysis of speaker demographics, speech duration distribution and geographic coverage. Additionally, we evaluate ASR performance on models trained from scratch, such as Conformer, as well as fine-tuned pre-trained models like HuBERT and Whisper, where fine-tuning demonstrates significant performance improvements. Furthermore, we assess speaker verification (SV) on our dataset, showing that, despite the challenges posed by the unique vocal characteristics of young children, the dataset effectively supports both ASR and SV tasks. This dataset is a valuable contribution to Mandarin child speech research and holds potential for applications in educational technology and child-computer interaction. It will be open-source and freely available for all academic purposes.
Search
Fix author
Co-authors
- Kuo Cai 2
- Kun Gai 2
- Xinchen Luo 2
- Ruiming Tang 2
- Guorui Zhou 2
- Honghui Bao 1
- Bo Chen 1
- Hongtao Cheng 1
- Zexuan Cheng 1
- Jiaxin Deng 1
- Yujie Guo 1
- Jiabei He 1
- Qigen Hu 1
- Yifei Hu 1
- Aobo Kong 1
- Han Li 1
- Wuchao Li 1
- Yonghua Lin 1
- Cheng Liu 1
- Zhanyu Liu 1
- Qiang Luo 1
- Yong Qin 1
- Lejian Ren 1
- Lu Ren 1
- Junda She 1
- Yi Su 1
- Haoqin Sun 1
- Huanjie Wang 1
- Hui Wang 1
- Qianqian Wang 1
- Xingmei Wang 1
- Yequan Wang 1
- Xiangyu Wu 1
- Yunfan Wu 1
- Xi Yang 1
- Jinghao Zhang 1
- Rongzhou Zhang 1
- Zhiyang Zhang 1
- Shiwan Zhao 1
- PengFei Zheng 1
- Jiaming Zhou 1
- Zhang Zixing 1