Yuan Zhou
Papers on this page may belong to the following people: Yuan Zhou (Purdue)
2026
Backdoors in RLVR: Jailbreak Backdoors in LLMs From Verifiable Reward
Weiyang Guo | Zesheng Shi | Zeen Zhu | Yuan Zhou | Min Zhang | Jing Li
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Weiyang Guo | Zesheng Shi | Zeen Zhu | Yuan Zhou | Min Zhang | Jing Li
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Reinforcement Learning with Verifiable Rewards (RLVR) is an emerging paradigm that significantly boosts a Large Language Model’s (LLM’s) reasoning abilities on complex logical tasks, such as mathematics and programming. However, we identify, for the first time, a latent vulnerability to backdoor attacks within the RLVR framework. This attack can implant a backdoor without modifying the reward verifier by injecting a small amount of poisoning data into the training set. Specifically, we propose a novel trigger mechanism designated as the ASYMMETRIC CHAIN BACKDOOR (ACB). The attack exploits the RLVR training loop by assigning substantial positive rewards for harmful responses and negative rewards for refusals. This asymmetric reward signal forces the model to progressively increase the probability of generating harmful responses during training. Our findings demonstrate that the RLVR backdoor attack is characterized by both high efficiency and strong generalization capabilities. Utilizing less than 2% poisoned data in train set, the backdoor can be successfully implanted across various model scales without degrading performance on benign tasks. Evaluations across multiple jailbreak benchmarks indicate that activating the trigger degrades safety performance by an average of 73%. Furthermore, the attack generalizes effectively to a wide range of jailbreak methods and unsafe behaviors.
2025
CPRM: A LLM-based Continual Pre-training Framework for Relevance Modeling in Commercial Search
Kaixin Wu | Yixin Ji | Zeyuan Chen | Qiang Wang | Cunxiang Wang | Hong Liu | Baijun Ji | Xu Jia | Zhongyi Liu | Jinjie Gu | Yuan Zhou | Linjian Mo
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track)
Kaixin Wu | Yixin Ji | Zeyuan Chen | Qiang Wang | Cunxiang Wang | Hong Liu | Baijun Ji | Xu Jia | Zhongyi Liu | Jinjie Gu | Yuan Zhou | Linjian Mo
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track)
Relevance modeling between queries and items stands as a pivotal component in commercial search engines, directly affecting the user experience. Given the remarkable achievements of large language models (LLMs) in various natural language processing (NLP) tasks, LLM-based relevance modeling is gradually being adopted within industrial search systems. Nevertheless, foundational LLMs lack domain-specific knowledge and do not fully exploit the potential of in-context learning. Furthermore, structured item text remains underutilized, and there is a shortage in the supply of corresponding queries and background knowledge. We thereby propose CPRM (Continual Pre-training for Relevance Modeling), a framework designed for the continual pre-training of LLMs to address these issues. Our CPRM framework includes three modules: 1) employing both queries and multi-field item to jointly pre-train for enhancing domain knowledge, 2) applying in-context pre-training, a novel approach where LLMs are pre-trained on a sequence of related queries or items, and 3) conducting reading comprehension on items to produce associated domain knowledge and background information (e.g., generating summaries and corresponding queries) to further strengthen LLMs. Results on offline experiments and online A/B testing demonstrate that our model achieves convincing performance compared to strong baselines.