Zijun Yao
Other people with similar names: Zijun Yao
Unverified author pages with similar names: Zijun Yao
2026
WildReward: Learning Reward Models from In-the-Wild Human Interactions
Hao Peng | Yunjia Qi | Xiaozhi Wang | Zijun Yao | Lei Hou | Juanzi Li
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Hao Peng | Yunjia Qi | Xiaozhi Wang | Zijun Yao | Lei Hou | Juanzi Li
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Reward models (RMs) are crucial for the training of large language models (LLMs), yet they typically rely on large-scale human-annotated preference pairs. With the widespread deployment of LLMs, in-the-wild interactions have emerged as a rich source of implicit reward signals. This raises the question: Can we develop reward models directly from in-the-wild interactions? In this work, we explore this possibility by adopting WildChat as an interaction source and proposing a pipeline to extract reliable human feedback, yielding 186k high-quality instances for training WildReward via ordinal regression directly on user feedback without preference pairs. Extensive experiments demonstrate that WildReward achieves comparable or even superior performance compared to conventional reward models, with improved calibration and cross-sample consistency. We also observe that WildReward benefits directly from user diversity, where more users yield stronger reward models. Finally, we apply WildReward to online DPO training and observe significant improvements across various downstream tasks. We will release our code, data, and models to facilitate future research.
SimPBL: A Multi-Agent Framework for Project-Based Learning
Daniel Zhang-Li | Joy Jia Yin Lim | Binglin Liu | Shangqing Tu | Zijun Yao | Hao Peng | Jifan Yu | Haoxuan Li | Zhanxin Hao | Ye He | Zekun Li | Jiangyi Wang | Lei Hou | Bin Xu | Xin Cong | Zhiyuan Liu | Huiqin Liu | Yu Zhang | Juanzi Li
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Daniel Zhang-Li | Joy Jia Yin Lim | Binglin Liu | Shangqing Tu | Zijun Yao | Hao Peng | Jifan Yu | Haoxuan Li | Zhanxin Hao | Ye He | Zekun Li | Jiangyi Wang | Lei Hou | Bin Xu | Xin Cong | Zhiyuan Liu | Huiqin Liu | Yu Zhang | Juanzi Li
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Project-Based Learning (PBL) is an important learning method that promotes understanding and acquiring practical skills through training learners through a project. However, effective PBL often requires sustained orchestration and collaboration, but existing LLM-based learning tools provide partial assistance without explicitly modeling these roles, and overly comprehensive help provided by LLM can reduce learner autonomy. We propose SimPBL, a multi-agent framework with an orchestrator agent that provides adaptive scaffolding from interaction logs and collaborator agents that support project work through boundary-aware collaboration. We conduct comprehensive evaluation to study the effectiveness of SimPBL, where we observe a 14% improvement in learner examination score. Results from extensive studies further highlights the ability of SimPBL to manage learning behavior and improve learning experience. Code and materials are available at https://anonymous.4open.science/r/SimPBL-D5B8.
Towards a Mechanistic Understanding of Large Reasoning Models: A Survey of Training, Inference, and Failures
Yi Hu | Jiaqi Gu | Ruxin Wang | Zijun Yao | Hao Peng | Xiaobao Wu | Jianhui Chen | Muhan Zhang | Liangming Pan
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yi Hu | Jiaqi Gu | Ruxin Wang | Zijun Yao | Hao Peng | Xiaobao Wu | Jianhui Chen | Muhan Zhang | Liangming Pan
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Reinforcement learning (RL) has catalyzed the emergence of Large Reasoning Models (LRMs) that have pushed reasoning capabilities to new heights. While their performance has garnered significant excitement, exploring the internal mechanisms driving these behaviors has become an equally critical research frontier. This paper provides a comprehensive survey of the mechanistic understanding of LRMs, organizing recent findings into three core dimensions: 1) training dynamics, 2) reasoning mechanisms, and 3) unintended behaviors. By synthesizing these insights, we aim to bridge the gap between black-box performance and mechanistic transparency. Finally, we discuss under-explored challenges to outline a roadmap for future mechanistic studies, including the need for applied interpretability, improved methodologies, and a unified theoretical framework.
Thinking Traps in Long Chain-of-Thought: A Measurable Study and Trap-Aware Adaptive Restart
Chenkang | Fan Yu | Junjie Nian | Sihan Zhao | Zhuoka Feng | Zijun Yao | Wang Heng | Yu Minshen | Yixin Cao
Findings of the Association for Computational Linguistics: ACL 2026
Chenkang | Fan Yu | Junjie Nian | Sihan Zhao | Zhuoka Feng | Zijun Yao | Wang Heng | Yu Minshen | Yixin Cao
Findings of the Association for Computational Linguistics: ACL 2026
Scaling test-time compute via Long Chain-of-Thought (Long-CoT) significantly enhances reasoning capabilities, yet extended generation does not guarantee correctness: after an early wrong commitment, models may keep elaborating a self-consistent but incorrect prefix. Through fine-grained trajectory analysis, we identify Thinking Traps, prefix-dominant deadlocks where later reflection, alternative attempts, or verification fails to revise the root error. On a curated subset of DAPO-MATH, 89% of failures exhibit such traps. To solve this problem, we introduce TAAR (Trap-Aware Adaptive Restart), a test-time control framework that trains a diagnostic policy to predict two signals from partial trajectories: a trap index for where to truncate and an escape probability for whether and how strongly to intervene. At inference time, TAAR truncates the trajectory before the predicted trap segment and adaptively restarts decoding; for severely trapped cases, it applies stronger perturbations, including higher-temperature resampling and an optional structured reboot suffix. Experiments on challenging mathematical and scientific reasoning benchmarks (AIME24, AIME25, GPQA-Diamond, HMMT25, BRUMO25) show that TAAR improves reasoning performance without fine-tuning base model parameters.
2025
LinguaLens: Towards Interpreting Linguistic Mechanisms of Large Language Models via Sparse Auto-Encoder
Yi Jing | Zijun Yao | Hongzhu Guo | Lingxu Ran | Xiaozhi Wang | Lei Hou | Juanzi Li
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Yi Jing | Zijun Yao | Hongzhu Guo | Lingxu Ran | Xiaozhi Wang | Lei Hou | Juanzi Li
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Large language models (LLMs) demonstrate exceptional performance on tasks requiring complex linguistic abilities, such as reference disambiguation and metaphor recognition/generation. Although LLMs possess impressive capabilities, their internal mechanisms for processing and representing linguistic knowledge remain largely opaque. Prior research on linguistic mechanisms is limited by coarse granularity, limited analysis scale, and narrow focus. In this study, we propose LinguaLens, a systematic and comprehensive framework for analyzing the linguistic mechanisms of large language models, based on Sparse Auto-Encoders (SAEs). We extract a broad set of Chinese and English linguistic features across four dimensions—morphology, syntax, semantics, and pragmatics. By employing counterfactual methods, we construct a large-scale counterfactual dataset of linguistic features for mechanism analysis. Our findings reveal intrinsic representations of linguistic knowledge in LLMs, uncover patterns of cross-layer and cross-lingual distribution, and demonstrate the potential to control model outputs. This work provides a systematic suite of resources and methods for studying linguistic mechanisms, offers strong evidence that LLMs possess genuine linguistic knowledge, and lays the foundation for more interpretable and controllable language modeling in future research.
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
Hao Peng | Yunjia Qi | Xiaozhi Wang | Zijun Yao | Bin Xu | Lei Hou | Juanzi Li
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Hao Peng | Yunjia Qi | Xiaozhi Wang | Zijun Yao | Bin Xu | Lei Hou | Juanzi Li
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Reward models (RMs) are crucial for the training and inference-time scaling up of large language models (LLMs). However, existing reward models primarily focus on human preferences, neglecting verifiable correctness signals which have shown strong potential in training LLMs. In this paper, we propose agentic reward modeling, a reward system that combines reward models with verifiable correctness signals from different aspects to provide reliable rewards. We empirically implement a reward agent, named RewardAgent, that combines human preference rewards with two verifiable signals: factuality and instruction following, to provide more reliable rewards. We conduct comprehensive experiments on existing reward model benchmarks and inference-time best-of-n searches on real-world downstream tasks. RewardAgent significantly outperforms vanilla reward models, demonstrating its effectiveness. We further construct training preference pairs using RewardAgent and train an LLM with the DPO objective, achieving superior performance on various NLP benchmarks compared to conventional reward models. Our codes are publicly released to facilitate further research.
Pre-training Distillation for Large Language Models: A Design Space Exploration
Hao Peng | Xin Lv | Yushi Bai | Zijun Yao | Jiajie Zhang | Lei Hou | Juanzi Li
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Hao Peng | Xin Lv | Yushi Bai | Zijun Yao | Jiajie Zhang | Lei Hou | Juanzi Li
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Knowledge distillation (KD) aims to transfer knowledge from a large teacher model to a smaller student model. Previous work applying KD in the field of large language models (LLMs) typically focused on the post-training phase, where the student LLM learns directly from instructions and corresponding responses generated by the teacher model. In this paper, we extend KD to the pre-training phase of LLMs, named pre-training distillation (PD). We first conduct a preliminary experiment using GLM-4-9B as the teacher LLM to distill a 1.9B parameter student LLM, validating the effectiveness of PD. Considering the key impact factors of distillation, we systematically explore the design space of pre-training distillation across four aspects: logits processing, loss selection, scaling law, and offline or online logits. We conduct extensive experiments to explore the design space of pre-training distillation and find better configurations and interesting conclusions, such as larger student LLMs generally benefiting more from pre-training distillation, while a larger teacher LLM does not necessarily guarantee better results. We hope our exploration of the design space will inform future practices in pre-training distillation.
Steering LVLMs via Sparse Autoencoder for Hallucination Mitigation
Zhenglin Hua | Jinghan He | Zijun Yao | Tianxu Han | Haiyun Guo | Yuheng Jia | Junfeng Fang
Findings of the Association for Computational Linguistics: EMNLP 2025
Zhenglin Hua | Jinghan He | Zijun Yao | Tianxu Han | Haiyun Guo | Yuheng Jia | Junfeng Fang
Findings of the Association for Computational Linguistics: EMNLP 2025
Large vision-language models (LVLMs) have achieved remarkable performance on multimodal tasks. However, they still suffer from hallucinations, generating text inconsistent with visual input, posing significant risks in real-world applications. Existing approaches to address this issue focus on incorporating external knowledge bases, alignment training, or decoding strategies, all of which require substantial computational cost and time. Recent works try to explore more efficient alternatives by adjusting LVLMs’ internal representations. Although promising, these methods may cause hallucinations to be insufficiently suppressed or lead to excessive interventions that negatively affect normal semantics. In this work, we leverage sparse autoencoders (SAEs) to identify semantic directions closely associated with faithfulness or hallucination, extracting more precise and disentangled hallucination-related representations. Our analysis demonstrates that interventions along the identified faithful direction can mitigate hallucinations, while those along the hallucinatory direction can exacerbate them. Building on these insights, we propose **S**teering LVLMs via **S**AE **L**atent Directions (SSL), a plug-and-play method based on SAE-derived latent directions to mitigate hallucinations in LVLMs. Extensive experiments demonstrate that SSL significantly outperforms existing decoding approaches in mitigating hallucinations, while maintaining transferability across different model architectures with negligible additional time overhead. The code is available at [https://github.com/huazhenglin2003/SSL](https://github.com/huazhenglin2003/SSL).
TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios
Xiaokang Zhang | Sijia Luo | Bohan Zhang | Zeyao Ma | Jing Zhang | Yang Li | Guanlin Li | Zijun Yao | Kangli Xu | Jinchang Zhou | Daniel Zhang-Li | Jifan Yu | Shu Zhao | Juanzi Li | Jie Tang
Findings of the Association for Computational Linguistics: ACL 2025
Xiaokang Zhang | Sijia Luo | Bohan Zhang | Zeyao Ma | Jing Zhang | Yang Li | Guanlin Li | Zijun Yao | Kangli Xu | Jinchang Zhou | Daniel Zhang-Li | Jifan Yu | Shu Zhao | Juanzi Li | Jie Tang
Findings of the Association for Computational Linguistics: ACL 2025
We introduce TableLLM, a robust large language model (LLM) with 8 billion parameters, purpose-built for proficiently handling tabular data manipulation tasks, whether they are embedded within documents or spreadsheets, catering to real-world office scenarios. We propose a distant supervision method for training, which comprises a reasoning process extension strategy, aiding in training LLMs to understand reasoning patterns more effectively as well as a cross-way validation strategy, ensuring the quality of the automatically generated data. To evaluate the performance of TableLLM, we have crafted benchmarks tailored to address both document and spreadsheet formats as well as constructed a well-organized evaluation pipeline capable of handling both scenarios. Thorough evaluations underscore the advantages of TableLLM when compared to various existing general-purpose and tabular data-focused LLMs. We have publicly released the model checkpoint, source code, benchmarks, and a web application for user interaction on this anonymized repository.
SeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented Generation
Zijun Yao | Weijian Qi | Liangming Pan | Shulin Cao | Linmei Hu | Liu Weichuan | Lei Hou | Juanzi Li
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zijun Yao | Weijian Qi | Liangming Pan | Shulin Cao | Linmei Hu | Liu Weichuan | Lei Hou | Juanzi Li
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Adaptive Retrieval-Augmented Generation (RAG) is an effective strategy to alleviate hallucination of large language models (LLMs). It dynamically determines whether LLMs need external knowledge for generation and invokes retrieval accordingly. This paper introduces Self-aware Knowledge Retrieval (SeaKR), a novel adaptive RAG model that extracts self-aware uncertainty of LLMs from their internal states. SeaKR activates retrieval when the LLMs present high self-aware uncertainty for generation. To effectively integrate retrieved knowledge snippets, SeaKR re-ranks them based on LLM’s self-aware uncertainty to preserve the snippet that reduces their uncertainty to the utmost. To facilitate solving complex tasks that require multiple retrievals, SeaKR utilizes their self-aware uncertainty to choose among different reasoning strategies. Our experiments on both complex and simple Question Answering datasets show that SeaKR outperforms existing adaptive RAG methods.
2022
Program Transfer for Answering Complex Questions over Knowledge Bases
Shulin Cao | Jiaxin Shi | Zijun Yao | Xin Lv | Jifan Yu | Lei Hou | Juanzi Li | Zhiyuan Liu | Jinghui Xiao
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Shulin Cao | Jiaxin Shi | Zijun Yao | Xin Lv | Jifan Yu | Lei Hou | Juanzi Li | Zhiyuan Liu | Jinghui Xiao
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Program induction for answering complex questions over knowledge bases (KBs) aims to decompose a question into a multi-step program, whose execution against the KB produces the final answer. Learning to induce programs relies on a large number of parallel question-program pairs for the given KB. However, for most KBs, the gold program annotations are usually lacking, making learning difficult. In this paper, we propose the approach of program transfer, which aims to leverage the valuable program annotations on the rich-resourced KBs as external supervision signals to aid program induction for the low-resourced KBs that lack program annotations. For program transfer, we design a novel two-stage parsing framework with an efficient ontology-guided pruning strategy. First, a sketch parser translates the question into a high-level program sketch, which is the composition of functions. Second, given the question and sketch, an argument parser searches the detailed arguments from the KB for functions. During the searching, we incorporate the KB ontology to prune the search space. The experiments on ComplexWebQuestions and WebQuestionSP show that our method outperforms SOTA methods significantly, demonstrating the effectiveness of program transfer and our framework. Our codes and datasets can be obtained from https://github.com/THU-KEG/ProgramTransfer.
Search
Fix author
Co-authors
- Juanzi Li 8
- Lei Hou 7
- Hao Peng 5
- Xiaozhi Wang 3
- Jifan Yu 3
- Shulin Cao 2
- Zhiyuan Liu 2
- Xin Lv 2
- Liangming Pan 2
- Yunjia Qi 2
- Bin Xu 2
- Daniel Zhang-Li 2
- Yushi Bai 1
- Yixin Cao 1
- Jianhui Chen 1
- Chenkang 1
- Xin Cong 1
- Junfeng Fang 1
- Zhuoka Feng 1
- Jiaqi Gu 1
- Hongzhu Guo 1
- Haiyun Guo 1
- Tianxu Han 1
- Zhanxin Hao 1
- Ye He 1
- Jinghan He 1
- Wang Heng 1
- Yi Hu 1
- Linmei Hu 1
- Zhenglin Hua 1
- Yuheng Jia 1
- Yi Jing 1
- Haoxuan Li 1
- Zekun Li 1
- Yang Li 1
- Guanlin Li 1
- Joy Jia Yin Lim 1
- Binglin Liu 1
- Huiqin Liu 1
- Sijia Luo 1
- Zeyao Ma 1
- Yu Minshen 1
- Junjie Nian 1
- Weijian Qi 1
- Lingxu Ran 1
- Jiaxin Shi 1
- Jie Tang 1
- Shangqing Tu 1
- Jiangyi Wang 1
- Ruxin Wang 1
- Liu Weichuan 1
- Xiaobao Wu 1
- Jinghui Xiao 1
- Kangli Xu 1
- Fan Yu 1
- Yu Zhang 1
- Jiajie Zhang 1
- Muhan Zhang 1
- Xiaokang Zhang 1
- Bohan Zhang 1
- Jing Zhang 1
- Sihan Zhao 1
- Shu Zhao 1
- Jinchang Zhou 1