Bin Xu
Other people with similar names: Bin Xu
Unverified author pages with similar names: Bin Xu
2026
SimPBL: A Multi-Agent Framework for Project-Based Learning
Daniel Zhang-Li | Joy Jia Yin Lim | Binglin Liu | Shangqing Tu | Zijun Yao | Hao Peng | Jifan Yu | Haoxuan Li | Zhanxin Hao | Ye He | Zekun Li | Jiangyi Wang | Lei Hou | Bin Xu | Xin Cong | Zhiyuan Liu | Huiqin Liu | Yu Zhang | Juanzi Li
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Daniel Zhang-Li | Joy Jia Yin Lim | Binglin Liu | Shangqing Tu | Zijun Yao | Hao Peng | Jifan Yu | Haoxuan Li | Zhanxin Hao | Ye He | Zekun Li | Jiangyi Wang | Lei Hou | Bin Xu | Xin Cong | Zhiyuan Liu | Huiqin Liu | Yu Zhang | Juanzi Li
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Project-Based Learning (PBL) is an important learning method that promotes understanding and acquiring practical skills through training learners through a project. However, effective PBL often requires sustained orchestration and collaboration, but existing LLM-based learning tools provide partial assistance without explicitly modeling these roles, and overly comprehensive help provided by LLM can reduce learner autonomy. We propose SimPBL, a multi-agent framework with an orchestrator agent that provides adaptive scaffolding from interaction logs and collaborator agents that support project work through boundary-aware collaboration. We conduct comprehensive evaluation to study the effectiveness of SimPBL, where we observe a 14% improvement in learner examination score. Results from extensive studies further highlights the ability of SimPBL to manage learning behavior and improve learning experience. Code and materials are available at https://anonymous.4open.science/r/SimPBL-D5B8.
From Knowing to Teaching: Scaffolding Pedagogical Decisions for LLM Agent
Yucheng Wang | Shen Yang | Jifan Yu | Haoxuan Li | Joy Jia Yin Lim | Daniel Zhang-Li | Huiqin Liu | Lei Hou | Juanzi Li | Bin Xu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yucheng Wang | Shen Yang | Jifan Yu | Haoxuan Li | Joy Jia Yin Lim | Daniel Zhang-Li | Huiqin Liu | Lei Hou | Juanzi Li | Bin Xu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Knowing and teaching differ fundamentally: effective instruction requires transforming knowledge into forms learners can grasp. Large language models, when asked to generate lessons (a concrete form of teaching), produce content lacking pedagogical depth. We trace this failure to three decisions that expert teachers make: selecting content by recognizing each source’s instructional role, sequencing topics so foundations precede applications, and synthesizing components into a unified whole. To scaffold these decisions, we introduce TeachCraft, a framework with three agents: Explorer classifies sources by pedagogical intent to guide selection; Planner orders objectives from foundational to advanced; Generator produces lesson materials through a schema that ensures consistency across components. To evaluate this approach, we construct LessonBench, 40 expert-designed lessons paired with two to five heterogeneous source documents, on which TeachCraft achieves 67.8% win rate in human evaluation and 79.6% in LLM-based evaluation against eight baselines, with ablations confirming that each decision contributes independently to overall lesson quality.[Source code is available at <https://anonymous.4open.science/r/TeachCraft-1672>]
2025
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
Hao Peng | Yunjia Qi | Xiaozhi Wang | Zijun Yao | Bin Xu | Lei Hou | Juanzi Li
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Hao Peng | Yunjia Qi | Xiaozhi Wang | Zijun Yao | Bin Xu | Lei Hou | Juanzi Li
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Reward models (RMs) are crucial for the training and inference-time scaling up of large language models (LLMs). However, existing reward models primarily focus on human preferences, neglecting verifiable correctness signals which have shown strong potential in training LLMs. In this paper, we propose agentic reward modeling, a reward system that combines reward models with verifiable correctness signals from different aspects to provide reliable rewards. We empirically implement a reward agent, named RewardAgent, that combines human preference rewards with two verifiable signals: factuality and instruction following, to provide more reliable rewards. We conduct comprehensive experiments on existing reward model benchmarks and inference-time best-of-n searches on real-world downstream tasks. RewardAgent significantly outperforms vanilla reward models, demonstrating its effectiveness. We further construct training preference pairs using RewardAgent and train an LLM with the DPO objective, achieving superior performance on various NLP benchmarks compared to conventional reward models. Our codes are publicly released to facilitate further research.
VerIF: Verification Engineering for Reinforcement Learning in Instruction Following
Hao Peng | Yunjia Qi | Xiaozhi Wang | Bin Xu | Lei Hou | Juanzi Li
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Hao Peng | Yunjia Qi | Xiaozhi Wang | Bin Xu | Lei Hou | Juanzi Li
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Reinforcement learning with verifiable rewards (RLVR) has become a key technique for enhancing large language models (LLMs), with verification engineering playing a central role. However, best practices for RL in instruction following remain underexplored. In this work, we explore the verification challenge in RL for instruction following and propose VerIF, a verification method that combines rule-based code verification with LLM-based verification from a large reasoning model (e.g., QwQ-32B). To support this approach, we construct a high-quality instruction-following dataset, VerInstruct, containing approximately 22,000 instances with associated verification signals. We apply RL training with VerIF to two models, achieving significant improvements across several representative instruction-following benchmarks. The trained models reach state-of-the-art performance among models of comparable size and generalize well to unseen constraints. We further observe that their general capabilities remain unaffected, suggesting that RL with VerIF can be integrated into existing RL recipes to enhance overall model performance. We will release our datasets, codes, and models to facilitate future research.
LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization
Xujia Wang | Yunjia Qi | Bin Xu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Xujia Wang | Yunjia Qi | Bin Xu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Parameter-Efficient Fine-Tuning (PEFT) methods, such as LoRA, significantly reduce the number of trainable parameters by introducing low-rank decomposition matrices. However, existing methods perform extensive matrix multiplications in domain specialization tasks, resulting in computational inefficiency and sub-optimal fine-tuning performance. Hence, we propose LoSiA (**Lo**w-Resources **S**ubnet **I**ntegration **A**daptation), an innovative method that dynamically localizes and optimizes critical parameters during the training process. Specifically, it identifies a sub-network using gradient sparsity analysis and optimizes it as the trainable target. This design enables effective high-rank adaptation by updating only the sub-network parameters, reducing the additional matrix multiplication. We also present LoSiA-Pro, a faster implementation of LoSiA, which reduces the training latency by about 27% compared to LoRA. Extensive evaluations show that our method achieves minimal performance drop compared to full fine-tuning, while requiring the least training time across domain specialization and common-sense reasoning tasks. Further analysis shows that LoSiA also reduces forgetting during continued training.
2024
LM-Interview: An Easy-to-use Smart Interviewer System via Knowledge-guided Language Model Exploitation
Hanming Li | Jifan Yu | Ruimiao Li | Zhanxin Hao | Yan Xuan | Jiaxi Yuan | Bin Xu | Juanzi Li | Zhiyuan Liu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Hanming Li | Jifan Yu | Ruimiao Li | Zhanxin Hao | Yan Xuan | Jiaxi Yuan | Bin Xu | Juanzi Li | Zhiyuan Liu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Semi-structured interviews are a crucial method of data acquisition in qualitative research. Typically controlled by the interviewer, the process progresses through a question-and-answer format, aimed at eliciting information from the interviewee. However, interviews are highly time-consuming and demand considerable experience of the interviewers, which greatly limits the efficiency and feasibility of data collection. Therefore, we introduce LM-Interview, a novel system designed to automate the process of preparing, conducting and analyzing semi-structured interviews. Experimental results demonstrate that LM-interview achieves performance comparable to that of skilled human interviewers.
MAVEN-FACT: A Large-scale Event Factuality Detection Dataset
Chunyang Li | Hao Peng | Xiaozhi Wang | Yunjia Qi | Lei Hou | Bin Xu | Juanzi Li
Findings of the Association for Computational Linguistics: EMNLP 2024
Chunyang Li | Hao Peng | Xiaozhi Wang | Yunjia Qi | Lei Hou | Bin Xu | Juanzi Li
Findings of the Association for Computational Linguistics: EMNLP 2024
Event Factuality Detection (EFD) task determines the factuality of textual events, i.e., classifying whether an event is a fact, possibility, or impossibility, which is essential for faithfully understanding and utilizing event knowledge. However, due to the lack of high-quality large-scale data, event factuality detection is under-explored in event understanding research, which limits the development of EFD community. To address these issues and provide faithful event understanding, we introduce MAVEN-FACT, a large-scale and high-quality EFD dataset based on the MAVEN dataset. MAVEN-FACT includes factuality annotations of 112,276 events, making it the largest EFD dataset. Extensive experiments demonstrate that MAVEN-FACT is challenging for both conventional fine-tuned models and large language models (LLMs). Thanks to the comprehensive annotations of event arguments and relations in MAVEN, MAVEN-FACT also supports some further analyses and we find that adopting event arguments and relations helps in event factuality detection for fine-tuned models but does not benefit LLMs. Furthermore, we preliminarily study an application case of event factuality detection and find it helps in mitigating event-related hallucination in LLMs. We will release our dataset and codes to facilitate further research on event factuality detection.
ADELIE: Aligning Large Language Models on Information Extraction
Yunjia Qi | Hao Peng | Xiaozhi Wang | Bin Xu | Lei Hou | Juanzi Li
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Yunjia Qi | Hao Peng | Xiaozhi Wang | Bin Xu | Lei Hou | Juanzi Li
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
DocEE-zh: A Fine-grained Benchmark for Chinese Document-level Event Extraction
Minghui Liu | MeiHan Tong | Yangda Peng | Lei Hou | Juanzi Li | Bin Xu
Findings of the Association for Computational Linguistics: EMNLP 2024
Minghui Liu | MeiHan Tong | Yangda Peng | Lei Hou | Juanzi Li | Bin Xu
Findings of the Association for Computational Linguistics: EMNLP 2024
Event extraction aims to identify events and then extract the arguments involved in those events. In recent years, there has been a gradual shift from sentence-level event extraction to document-level event extraction research. Despite the significant success achieved in English domain event extraction research, event extraction in Chinese still remains largely unexplored. However, a major obstacle to promoting Chinese document-level event extraction is the lack of fine-grained, wide domain coverage datasets for model training and evaluation. In this paper, we propose DocEE-zh, a new Chinese document-level event extraction dataset comprising over 36,000 events and more than 210,000 arguments. DocEE-zh is an extension of the DocEE dataset, utilizing the same event schema, and all data has been meticulously annotated by human experts. We highlight two features: focus on high-interest event types and fine-grained argument types. Experimental results indicate that state-of-the-art models still fail to achieve satisfactory performance, with an F1 score of 45.88% on the event argument extraction task, revealing that Chinese document-level event extraction (DocEE) remains an unresolved challenge. DocEE-zh is now available at https://github.com/tongmeihan1995/DocEE.git.
Search
Fix author
Co-authors
- Juanzi Li 8
- Lei Hou 7
- Hao Peng 5
- Yunjia Qi 5
- Xiaozhi Wang 4
- Jifan Yu 3
- Zhanxin Hao 2
- Haoxuan Li 2
- Joy Jia Yin Lim 2
- Zhiyuan Liu 2
- Huiqin Liu 2
- Zijun Yao 2
- Daniel Zhang-Li 2
- Xin Cong 1
- Ye He 1
- Hanming Li 1
- Ruimiao Li 1
- Zekun Li 1
- Chunyang Li 1
- Binglin Liu 1
- Minghui Liu 1
- Yangda Peng 1
- Meihan Tong 1
- Shangqing Tu 1
- Jiangyi Wang 1
- Yucheng Wang 1
- Xujia Wang 1
- Yan Xuan 1
- Shen Yang 1
- Jiaxi Yuan 1
- Yu Zhang 1