Xinyang Zhang
2026
Mitigating Lost in Multi-turn Conversation via Curriculum RL with Verifiable Accuracy and Abstention Rewards
Ming Li | Pei Chen | Zhenhao Zhang | Tao Yang | Xinyang Zhang | Han Li | Tianyu Cao | Ming Zeng | Zhuofeng Wu | Meng Jiang | Huasheng Li | Lihong Li | Bing Yin
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Ming Li | Pei Chen | Zhenhao Zhang | Tao Yang | Xinyang Zhang | Han Li | Tianyu Cao | Ming Zeng | Zhuofeng Wu | Meng Jiang | Huasheng Li | Lihong Li | Bing Yin
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Models demonstrate strong capabilities in single-turn instruction following but suffer from Lost-in-Conversation (LiC), a degradation in performance as information is revealed progressively in multi-turn settings. Motivated by the current progress on Reinforcement Learning with Verifiable Rewards (RLVR), we propose Curriculum Reinforcement Learning with Verifiable Accuracy and Abstention Rewards (RLAAR), a framework that encourages models not only to generate correct answers, but also to judge the solvability of questions in the multi-turn conversation setting. Our approach employs a competence-gated curriculum that incrementally increases dialogue difficulty (in terms of instruction shards), stabilizing training while promoting reliability. Using multi-turn, on-policy rollouts and a mixed-reward system, RLAAR teaches models to balance problem-solving with informed abstention, reducing premature answering behaviors that cause LiC. Evaluated on LiC benchmarks, RLAAR significantly mitigates LiC performance decay (62.6% to 75.1%) and improves calibrated abstention rates (33.5% to 73.4%). Together, these results provide a practical recipe for building multi-turn reliable and trustworthy LLMs.
2025
ALERT: An LLM-powered Benchmark for Automatic Evaluation of Recommendation Explanations
Yichuan Li | Xinyang Zhang | Chenwei Zhang | Mao Li | Tianyi Liu | Pei Chen | Yifan Gao | Kyumin Lee | Kaize Ding | Zhengyang Wang | Zhihan Zhang | Jingbo Shang | Xian Li | Trishul Chilimbi
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Yichuan Li | Xinyang Zhang | Chenwei Zhang | Mao Li | Tianyi Liu | Pei Chen | Yifan Gao | Kyumin Lee | Kaize Ding | Zhengyang Wang | Zhihan Zhang | Jingbo Shang | Xian Li | Trishul Chilimbi
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Recommendation explanation systems have become increasingly vital with the widespread adoption of recommender systems. However, existing recommendation explanation evaluation benchmarks suffer from limited item diversity, impractical user profiling requirements, and unreliable and unscalable evaluation protocols. We present ALERT, a model-agnostic recommendation explanation evaluation benchmark. The benchmark comprises three main contributions: 1) a diverse dataset encompassing 15 Amazon e-commerce categories with 2,761 user-item interactions, incorporating implicit preferences through purchase histories;2) two novel LLM-powered automatic evaluators that enable scalable and human-preference aligned evaluation of explanations; and 3) a robust divide-and-aggregate approach that synthesizes multiple LLM judgments, achieving 70% concordance with expert human evaluation and substantially outperforming existing methods.ALERT facilitates comprehensive evaluation of recommendation explanations across diverse domains, advancing the development of more effective explanation systems.
DORM: Preference Data Weights Optimization for Reward Modeling in LLM Alignment
Rongzhi Zhang | Chenwei Zhang | Xinyang Zhang | Liang Qiu | Haoming Jiang | Yuchen Zhuang | Qingru Zhang | Hyokun Yun | Xian Li | Bing Yin | Tuo Zhao | Chao Zhang
Findings of the Association for Computational Linguistics: EMNLP 2025
Rongzhi Zhang | Chenwei Zhang | Xinyang Zhang | Liang Qiu | Haoming Jiang | Yuchen Zhuang | Qingru Zhang | Hyokun Yun | Xian Li | Bing Yin | Tuo Zhao | Chao Zhang
Findings of the Association for Computational Linguistics: EMNLP 2025
Aligning large language models (LLMs) with human preferences relies heavily on high-quality reward models. However, existing approaches struggle with two critical challenges: noisy preference labels and the varying importance of preference samples. We introduce DORM, a method that enhances reward modeling by learning to dynamically weigh preference data.DORM initializes data importance using a combination of model uncertainty and prediction disagreement, then iteratively refines them via bilevel optimization to maximize validation performance. Using only 50k samples, DORM trains a 12B reward model that achieves 90.5% accuracy on RewardBench, matching the performance of models trained on significantly larger datasets. Furthermore, downstream alignment tasks show that fine-tuned LLMs with DORM achieve a 61.2% win rate against baseline methods, highlighting its data efficiency and generalizability.
2023
Patton: Language Model Pretraining on Text-Rich Networks
Bowen Jin | Wentao Zhang | Yu Zhang | Yu Meng | Xinyang Zhang | Qi Zhu | Jiawei Han
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Bowen Jin | Wentao Zhang | Yu Zhang | Yu Meng | Xinyang Zhang | Qi Zhu | Jiawei Han
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
A real-world text corpus sometimes comprises not only text documents, but also semantic links between them (e.g., academic papers in a bibliographic network are linked by citations and co-authorships).Text documents and semantic connections form a text-rich network, which empowers a wide range of downstream tasks such as classification and retrieval. However, pretraining methods for such structures are still lacking, making it difficult to build one generic model that can be adapted to various tasks on text-rich networks. Current pretraining objectives, such as masked language modeling, purely model texts and do not take inter-document structure information into consideration. To this end, we propose our PretrAining on TexT-Rich NetwOrk framework Patton.Patton includes two pretraining strategies: network-contextualized masked language modeling and masked node prediction, to capture the inherent dependency between textual attributes and network structure. We conduct experiments on four downstream tasks in five datasets from both academic and e-commerce domains, where Patton outperforms baselines significantly and consistently.
2020
META: Metadata-Empowered Weak Supervision for Text Classification
Dheeraj Mekala | Xinyang Zhang | Jingbo Shang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Dheeraj Mekala | Xinyang Zhang | Jingbo Shang
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Recent advances in weakly supervised learning enable training high-quality text classifiers by only providing a few user-provided seed words. Existing methods mainly use text data alone to generate pseudo-labels despite the fact that metadata information (e.g., author and timestamp) is widely available across various domains. Strong label indicators exist in the metadata and it has been long overlooked mainly due to the following challenges: (1) metadata is multi-typed, requiring systematic modeling of different types and their combinations, (2) metadata is noisy, some metadata entities (e.g., authors, venues) are more compelling label indicators than others. In this paper, we propose a novel framework, META, which goes beyond the existing paradigm and leverages metadata as an additional source of weak supervision. Specifically, we organize the text data and metadata together into a text-rich network and adopt network motifs to capture appropriate combinations of metadata. Based on seed words, we rank and filter motif instances to distill highly label-indicative ones as “seed motifs”, which provide additional weak supervision. Following a bootstrapping manner, we train the classifier and expand the seed words and seed motifs iteratively. Extensive experiments and case studies on real-world datasets demonstrate superior performance and significant advantages of leveraging metadata as weak supervision.
Search
Fix author
Co-authors
- Pei Chen 2
- Xian Li 2
- Jingbo Shang 2
- Bing Yin 2
- Chenwei Zhang 2
- Tianyu Cao 1
- Trishul Chilimbi 1
- Kaize Ding 1
- Yifan Gao 1
- Jiawei Han 1
- Haoming Jiang 1
- Meng Jiang 1
- Bowen Jin 1
- Kyumin Lee 1
- Han Li 1
- Huasheng Li 1
- Lihong Li 1
- Mao Li 1
- Ming Li 1
- Yichuan Li 1
- Tianyi Liu 1
- Dheeraj Mekala 1
- Yu Meng 1
- Liang Qiu 1
- Zhengyang Wang 1
- Zhuofeng Wu 1
- Tao Yang 1
- Hyokun Yun 1
- Ming Zeng 1
- Chao Zhang 1
- Qingru Zhang 1
- Rongzhi Zhang 1
- Wentao Zhang 1
- Yu Zhang 1
- Zhenhao Zhang 1
- Zhihan Zhang 1
- Tuo Zhao 1
- Qi Zhu 1
- Yuchen Zhuang 1