Yujun Zhou
Other people with similar names: Yujun Zhou
Unverified author pages with similar names: Yujun Zhou
2026
PolicyLLM: Towards Excellent Comprehension of Public Policy for Large Language Models
Han Bao | Penghao Zhang | Yue Huang | Zhengqing Yuan | Yanchi Ru | SU Rui | Yujun Zhou | Xiangqi Wang | Kehan Guo | Nitesh V Chawla | Yanfang Ye | Xiangliang Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Han Bao | Penghao Zhang | Yue Huang | Zhengqing Yuan | Yanchi Ru | SU Rui | Yujun Zhou | Xiangqi Wang | Kehan Guo | Nitesh V Chawla | Yanfang Ye | Xiangliang Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Large Language Models (LLMs) are increasingly integrated into real-world decision-making, including in the domain of public policy. Yet, their ability to comprehend and reason about policy-related content remains underexplored. To fill this gap, we present PolicyBench, the first large-scale bilingual benchmark evaluating policy comprehension, comprising 21K cases across a broad spectrum of policy areas, capturing the diversity and complexity of real-world governance. Following Bloom’s taxonomy, the benchmark assesses three core capabilities: (1) Memorization: factual recall of policy knowledge, (2) Understanding: conceptual and contextual reasoning, and (3) Application: problem-solving in real-life policy scenarios. Building on this benchmark, we further propose PolicyMoE, a domain-specialized Mixture-of-Experts (MoE) model with expert modules aligned to each cognitive level. The proposed models demonstrate stronger performance on application-oriented policy tasks than on memorization or conceptual understanding, and yields the highest accuracy on structured reasoning tasks. Our results reveal key limitations of current LLMs in policy understanding and suggest paths toward more reliable, policy-focused models
Your Reasoning Model is Secretly a Reward Model - Optimization-Free Verification from Experience
Zhenwen Liang | Ruosen Li | Yujun Zhou | Linfeng Song | Dian Yu | Xinya Du | Haitao Mi | Dong Yu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhenwen Liang | Ruosen Li | Yujun Zhou | Linfeng Song | Dian Yu | Xinya Du | Haitao Mi | Dong Yu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Assessing the quality of Large Language Model (LLM) outputs becomes especially challenging in high-branching settings, where a single prompt yields many plausible candidates. Existing verifiers typically operate on the surface text (e.g., reward models, LLM judges, majority voting) or on confidence proxies derived from token probabilities, both of which can be brittle: the former can be influenced by stylistic artifacts, while the latter is often miscalibrated. In this paper, we study a third source of information—the model’s hidden states—for binary correctness verification in tasks with a reliable success/failure signal (e.g., deterministic checkers or reference-grounded answers). We find that correct and incorrect solutions exhibit measurable geometric differences in their hidden-state trajectories. To isolate this signal with minimal modeling assumptions, we introduce Clue (Clustering and Experience-based Verification), a training-free, non-parametric verifier. Clue summarizes each reasoning trace by an activation delta—the difference between hidden states at the start and end of the explicit reasoning span—and predicts correctness by comparing this delta to two class centroids computed from labeled experience. Across math (AIME 24/25), scientific QA (GPQA), and a multi-domain benchmark (WebInstruct-verified), Clue improves selection and reranking, with particularly strong gains on smaller or less-calibrated models. For example, on AIME 24 with a 1.5B model, Clue raises accuracy from 56.7% (majority@64) to 70.0% (top-maj@16).
Save the Good Prefix: Precise Error Penalization via Process-Supervised RL to Enhance LLM Reasoning
Haolin Liu | Dian Yu | Sidi Lu | Yujun Zhou | Rui Liu | Zhenwen Liang | Haitao Mi | Chen-Yu Wei | Dong Yu
Findings of the Association for Computational Linguistics: ACL 2026
Haolin Liu | Dian Yu | Sidi Lu | Yujun Zhou | Rui Liu | Zhenwen Liang | Haitao Mi | Chen-Yu Wei | Dong Yu
Findings of the Association for Computational Linguistics: ACL 2026
Reinforcement learning (RL) has emerged as a powerful framework for improving the reasoning capabilities of large language models (LLMs). However, most existing RL approaches rely on sparse outcome rewards, which fail to credit correct intermediate steps in partially successful solutions. Process reward models (PRMs) offer fine-grained step-level supervision, but their scores are often noisy and difficult to evaluate. As a result, recent PRM benchmarks focus on a more objective capability: detecting the first incorrect step in a reasoning path. However, this evaluation target is misaligned with how PRMs are typically used in RL, where their step-wise scores are treated as raw rewards to maximize. To bridge this gap, we propose Verifiable Prefix Policy Optimization (VPPO), which uses PRMs only to localize the first error during RL. Given an incorrect rollout, VPPO partitions the trajectory into a verified correct prefix and an erroneous suffix based on the first error, rewarding the former while applying targeted penalties only after the detected mistake. This design yields stable, interpretable learning signals and improves credit assignment. Across multiple reasoning benchmarks, VPPO consistently outperforms sparse-reward RL and prior PRM-guided baselines on both Pass@1 and Pass@K.
2025
Beyond Single-Value Metrics: Evaluating and Enhancing LLM Unlearning with Cognitive Diagnosis
Yicheng Lang | Kehan Guo | Yue Huang | Yujun Zhou | Haomin Zhuang | Tianyu Yang | Yao Su | Xiangliang Zhang
Findings of the Association for Computational Linguistics: ACL 2025
Yicheng Lang | Kehan Guo | Yue Huang | Yujun Zhou | Haomin Zhuang | Tianyu Yang | Yao Su | Xiangliang Zhang
Findings of the Association for Computational Linguistics: ACL 2025
Due to the widespread use of LLMs and the rising critical ethical and safety concerns, LLM unlearning methods have been developed to remove harmful knowledge and undesirable capabilities. In this context, evaluations are mostly based on single-value metrics such as QA accuracy. However, these metrics often fail to capture the nuanced retention of harmful knowledge components, making it difficult to assess the true effectiveness of unlearning. To address this issue, we propose UNCD (UNlearning evaluation using Cognitive Diagnosis), a novel framework that leverages Cognitive Diagnosis Modeling for fine-grained evaluation of LLM unlearning. Our dedicated benchmark, UNCD-Cyber, provides a detailed assessment of the removal of dangerous capabilities. Moreover, we introduce UNCD-Agent, which refines unlearning by diagnosing knowledge remnants and generating targeted unlearning data. Extensive experiments across eight unlearning methods and two base models demonstrate that UNCD not only enhances evaluation but also effectively facilitates the removal of harmful LLM abilities.
Dissecting Logical Reasoning in LLMs: A Fine-Grained Evaluation and Supervision Study
Yujun Zhou | Jiayi Ye | Zipeng Ling | Yufei Han | Yue Huang | Haomin Zhuang | Zhenwen Liang | Kehan Guo | Taicheng Guo | Xiangqi Wang | Xiangliang Zhang
Findings of the Association for Computational Linguistics: EMNLP 2025
Yujun Zhou | Jiayi Ye | Zipeng Ling | Yufei Han | Yue Huang | Haomin Zhuang | Zhenwen Liang | Kehan Guo | Taicheng Guo | Xiangqi Wang | Xiangliang Zhang
Findings of the Association for Computational Linguistics: EMNLP 2025
Logical reasoning is a core capability for large language models (LLMs), yet existing benchmarks that rely solely on final-answer accuracy fail to capture the quality of the reasoning process. To address this, we introduce FineLogic, a fine-grained evaluation framework that assesses logical reasoning across three dimensions: overall accuracy, stepwise soundness, and representation-level probing. Leveraging this framework, we conduct a comprehensive study on how different supervision formats in fine-tuning shape reasoning abilities. We fine-tune LLMs on four supervision styles—one in natural language and three symbolic variants—and find a key trade-off: natural language supervision excels at generalization to out-of-distribution and long-chain problems, whereas symbolic supervision is superior at instilling structurally sound, atomic reasoning steps. Furthermore, our probing analysis indicates that fine-tuning primarily refines the model’s step-by-step generation process, rather than improving its ability to converge on an answer early. Together, our framework and analysis provide a more rigorous lens for evaluating and improving logical reasoning in LLMs. The code is available at https://github.com/YujunZhou/FineLogic.
2024
Defending Jailbreak Prompts via In-Context Adversarial Game
Yujun Zhou | Yufei Han | Haomin Zhuang | Kehan Guo | Zhenwen Liang | Hongyan Bao | Xiangliang Zhang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Yujun Zhou | Yufei Han | Haomin Zhuang | Kehan Guo | Zhenwen Liang | Hongyan Bao | Xiangliang Zhang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Large Language Models (LLMs) demonstrate remarkable capabilities across diverse applications. However, concerns regarding their security, particularly the vulnerability to jailbreak attacks, persist. Drawing inspiration from adversarial training in deep learning and LLM agent learning processes, we introduce the In-Context Adversarial Game (ICAG) for defending against jailbreaks without the need for fine-tuning. ICAG leverages agent learning to conduct an adversarial game, aiming to dynamically extend knowledge to defend against jailbreaks. Unlike traditional methods that rely on static datasets, ICAG employs an iterative process to enhance both the defense and attack agents. This continuous improvement process strengthens defenses against newly generated jailbreak prompts. Our empirical studies affirm ICAG’s efficacy, where LLMs safeguarded by ICAG exhibit significantly reduced jailbreak success rates across various attack scenarios. Moreover, ICAG demonstrates remarkable transferability to other LLMs, indicating its potential as a versatile defense mechanism. The code is available at https://github.com/YujunZhou/In-Context-Adversarial-Game.
Search
Fix author
Co-authors
- Kehan Guo 4
- Zhenwen Liang 4
- Xiangliang Zhang 4
- Yue Huang 3
- Haomin Zhuang 3
- Yufei Han 2
- Haitao Mi 2
- Xiangqi Wang 2
- Dian Yu 2
- Dong Yu (于东) 2
- Han Bao 1
- Hongyan Bao 1
- Nitesh V. Chawla 1
- Xinya Du 1
- Taicheng Guo 1
- Yicheng Lang 1
- Ruosen Li 1
- Zipeng Ling 1
- Haolin Liu 1
- Rui Liu 1
- Sidi Lu 1
- Yanchi Ru 1
- SU Rui 1
- Linfeng Song 1
- Yao Su 1
- Chen-Yu Wei 1
- Tianyu Yang 1
- Yanfang Ye 1
- Jiayi Ye 1
- Zhengqing Yuan 1
- Penghao Zhang 1