Zulong Chen
2026
Learning from Emptiness: De-biasing Listwise Rerankers with Content-Agnostic Probability Calibration
Hang Lv | Hongchao Gu | Ruiqing Yang | Liangyue Li | Zulong Chen | Defu Lian | Hao Wang | Enhong Chen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Hang Lv | Hongchao Gu | Ruiqing Yang | Liangyue Li | Zulong Chen | Defu Lian | Hao Wang | Enhong Chen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Generative listwise reranking leverages global context for superior retrieval but is plagued by intrinsic position bias, where models exhibit structural sensitivity to input order independent of relevance. Existing mitigations present a dilemma: inference-time aggregation incurs prohibitive latency, while training-based methods often fail to eradicate ingrained priors, particularly in compact models. To resolve this dilemma, we propose CapCal (Content-Agnostic Probability Calibration), a training-free framework that mechanically decouples positional bias from ranking decisions. By estimating the bias distribution via content-free placeholders, CapCal rectifies output logits through an entropy-adaptive contrastive mechanism. Evaluations across 10 benchmarks confirm that CapCal achieves superior performance among training-free methods while preserving single-pass efficiency. Notably, it unlocks the latent potential of lightweight models (e.g., 0.6B), delivering absolute NDCG gains exceeding 10 points and outperforming computationally expensive data augmentation strategies.
Long-Chain Reasoning Distillation via Adaptive Prefix Alignment
Zhenghao Liu | Zhuoyang Wu | Xinze Li | Yukun Yan | Shuo Wang | Zulong Chen | Yu Gu | Ge Yu | Maosong Sun
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhenghao Liu | Zhuoyang Wu | Xinze Li | Yukun Yan | Shuo Wang | Zulong Chen | Yu Gu | Ge Yu | Maosong Sun
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Models (LLMs) have demonstrated remarkable reasoning capabilities, particularly in solving complex mathematical problems. Recent studies show that distilling long reasoning trajectories can effectively enhance the reasoning performance of small-scale student models. However, teacher-generated reasoning trajectories are often excessively long and structurally complex, making them difficult for student models to learn. This mismatch leads to a gap between the provided supervision signal and the learning capacity of the student model. To address this challenge, we propose Prefix-ALIGNment distillation (P-ALIGN), a framework that fully exploits teacher CoTs for distillation through adaptive prefix alignment. Specifically, P-ALIGN adaptively truncates teacher-generated reasoning trajectories by determining whether the remaining suffix is concise and sufficient to guide the student model. Then, P-ALIGN leverages the teacher-generated prefix to supervise the student model, encouraging effective prefix alignment. Experiments on multiple mathematical reasoning benchmarks demonstrate that P-ALIGN outperforms all baselines by over 3%. Further analysis indicates that the prefixes constructed by P-ALIGN provide more effective supervision signals, while avoiding the negative impact of redundant and uncertain reasoning components. All codes are available at https://github.com/NEUIR/P-ALIGN.
Empirical Analysis of Decoding Biases in Masked Diffusion Models
Pengcheng Huang | Tianming Liu | Zhenghao Liu | Yukun Yan | Shuo Wang | Tong Xiao | Zulong Chen | Maosong Sun
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Pengcheng Huang | Tianming Liu | Zhenghao Liu | Yukun Yan | Shuo Wang | Tong Xiao | Zulong Chen | Maosong Sun
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Masked Diffusion Models (MDMs) have recently emerged as a promising non-autoregressive paradigm for sequence generation. However, their performance is highly sensitive to the choice of decoding strategy. In this work, we reveal that prevalent uncertainty-based decoding strategies induce two decoding biases in MDMs: rigid boundary bias and trivial token bias. These biases limit the model’s reasoning ability and ultimately degrade generation quality. To address these challenges, we propose UNmasking Calibration for DecOding DEbiasing (UNCODE), a decoding calibration framework that regularizes uncertainty-based decoding by incorporating two complementary priors to shape global decoding trajectories and promote content informativeness. Extensive experiments on three advanced MDMs across seven reasoning- and planning-intensive benchmarks demonstrate that UNCODE consistently outperforms existing decoding strategies by more than 7%, while achieving performance comparable to autoregressive models of similar parameter scales. Our code will be made publicly available on GitHub.
UNIKIE-BENCH: Benchmarking Large Multimodal Models for Key Information Extraction in Visual Documents
Yifan Ji | Zhipeng Xu | Zhenghao Liu | Zulong Chen | Qian Zhang | Zhibo Yang | Junyang Lin | Yu Gu | Ge Yu | Maosong Sun
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yifan Ji | Zhipeng Xu | Zhenghao Liu | Zulong Chen | Qian Zhang | Zhibo Yang | Junyang Lin | Yu Gu | Ge Yu | Maosong Sun
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Key Information Extraction (KIE) from real-world documents remains challenging due to substantial variations in layout structures, visual quality, and task-specific information requirements. Recent Large Multimodal Models (LMMs) have shown promising potential for performing end-to-end KIE directly from document images. To enable a comprehensive and systematic evaluation across realistic and diverse application scenarios, we introduce UNIKIE-BENCH, a unified benchmark designed to rigorously evaluate the KIE capabilities of LMMs. UNIKIE-BENCH consists of two complementary tracks: a constrained-category KIE track with scenario-predefined schemas that reflect practical application needs, and an open-category KIE track that extracts any key information that is explicitly present in the document. Experiments on 15 state-of-the-art LMMs reveal substantial performance degradation under diverse schema definitions, long-tail key fields, and complex layouts, along with pronounced performance disparities across different document types and scenarios. These findings underscore persistent challenges in grounding accuracy and layout-aware reasoning for LMM-based KIE. All codes and datasets are available at https://github.com/NEUIR/UNIKIE-BENCH.
Enhancing Online Recruitment with Category-Aware MoE and LLM-based Data Augmentation
Minping Chen | Bing Xu | Zulong Chen | Chuanfei Xu | Ying Zhou | Zui Tao | Zeyi Wen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Minping Chen | Bing Xu | Zulong Chen | Chuanfei Xu | Ying Zhou | Zui Tao | Zeyi Wen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Person-Job Fit (PJF) is a critical component for online recruitment. Existing approaches face several challenges, particularly in handling low-quality job descriptions and similar candidate-job pairs, which impair model performance. To address these challenges, this paper proposes a large language model (LLM) based method with two novel techniques: (1) LLM-based data augmentation, which polishes and rewrites low-quality job descriptions by leveraging chain-of-thought (COT) prompts, and (2) category-aware Mixture of Experts (MoE) that assists in identifying similar candidate-job pairs. This MoE module incorporates category embeddings to dynamically assign weights to the experts and learns more distinguishable patterns for similar candidate-job pairs. We perform offline evaluations and online A/B tests on our recruitment platform. Our method relatively surpasses existing methods by 2.40% in AUC and 7.46% in GAUC, and boosts click-through conversion rate (CTCVR) by 19.4% in online tests, saving millions of CNY in external headhunting expenses.
2025
Character is Destiny: Can Persona-assigned Language Models Make Personal Choices?
Rui Xu | Xintao Wang | Jiangjie Chen | Siyu Yuan | Xinfeng Yuan | Jiaqing Liang | Zulong Chen | Xiaoqingdong | Yanghua Xiao
Findings of the Association for Computational Linguistics: EMNLP 2025
Rui Xu | Xintao Wang | Jiangjie Chen | Siyu Yuan | Xinfeng Yuan | Jiaqing Liang | Zulong Chen | Xiaoqingdong | Yanghua Xiao
Findings of the Association for Computational Linguistics: EMNLP 2025
Can Large Language Models (LLMs) simulate humans in making important decisions? Recent research has unveiled the potential of using LLMs to develop role-playing language agents (RPLAs), mimicking mainly the knowledge and tones of various characters. However, imitative decision-making necessitates a more nuanced understanding of personas. In this paper, we benchmark the ability of LLMs in persona-driven decision-making. Specifically, we investigate whether LLMs can predict characters’ decisions provided by the preceding stories in high-quality novels. Leveraging character analyses written by literary experts, we construct a dataset LIFECHOICE comprising 2,512 characters’ decision points from 470 books. Then, we conduct comprehensive experiments on LIFECHOICE with various LLMs and RPLA methodologies. The results demonstrate that state-of-the-art LLMs exhibit promising capabilities in this task, yet substantial room for improvement remains. Hence, we further propose the CHARMAP method, which adopts persona-based memory retrieval and significantly advances RPLAs on this task.
SEAL: Structure and Element Aware Learning Improves Long Structured Document Retrieval
Xinhao Huang | Zhibo Ren | Yipeng Yu | Ying Zhou | Zulong Chen | Zeyi Wen
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Xinhao Huang | Zhibo Ren | Yipeng Yu | Ying Zhou | Zulong Chen | Zeyi Wen
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
In long structured document retrieval, existing methods typically fine-tune pre-trained language models (PLMs) using contrastive learning on datasets lacking explicit structural information. This practice suffers from two critical issues: 1) current methods fail to leverage structural features and element-level semantics effectively, and 2) the lack of datasets containing structural metadata. To bridge these gaps, we propose SEAL, a novel contrastive learning framework. It leverages structure-aware learning to preserve semantic hierarchies and masked element alignment for fine-grained semantic discrimination. Furthermore, we release StructDocRetrieval, a long structured document retrieval dataset with rich structural annotations. Extensive experiments on both the released and industrial datasets across various modern PLMs, and online A/B testing demonstrate consistent improvements, boosting NDCG@10 from 73.96% to 77.84% on BGE-M3. The resources are available at https://github.com/xinhaoH/SEAL.
Enhancing Talent Search Ranking with Role-Aware Expert Mixtures and LLM-based Fine-Grained Job Descriptions
Jihang Li | Bing Xu | Zulong Chen | Chuanfei Xu | Minping Chen | Suyu Liu | Ying Zhou | Zeyi Wen
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Jihang Li | Bing Xu | Zulong Chen | Chuanfei Xu | Minping Chen | Suyu Liu | Ying Zhou | Zeyi Wen
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Talent search is a cornerstone of modern recruitment systems, yet existing approaches often struggle to capture nuanced job-specific preferences, model recruiter behavior at a fine-grained level, and mitigate noise from subjective human judgments. We present a novel framework that enhances talent search effectiveness and delivers substantial business value through two key innovations: (i) leveraging LLMs to extract fine-grained recruitment signals from job descriptions and historical hiring data, and (ii) employing a role-aware multi-gate MoE network to capture behavioral differences across recruiter roles. To further reduce noise, we introduce a multi-task learning module that jointly optimizes click-through rate (CTR), conversion rate (CVR), and resume matching relevance. Experiments on real-world recruitment data and online A/B testing show relative AUC gains of 1.70% (CTR) and 5.97% (CVR), and a 17.29% lift in click-through conversion rate. These improvements reduce dependence on external sourcing channels, enabling an estimated annual cost saving of millions of CNY.
DEEPER Insight into Your User: Directed Persona Refinement for Dynamic Persona Modeling
Aili Chen | Chengyu Du | Jiangjie Chen | Jinghan Xu | Yikai Zhang | Siyu Yuan | Zulong Chen | Liangyue Li | Yanghua Xiao
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Aili Chen | Chengyu Du | Jiangjie Chen | Jinghan Xu | Yikai Zhang | Siyu Yuan | Zulong Chen | Liangyue Li | Yanghua Xiao
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
To advance personalized applications such as recommendation systems and user behavior prediction, recent research increasingly adopts large language models (LLMs) for human-readable persona modeling. In dynamic real-world scenarios, effective persona modeling necessitates leveraging streaming behavior data to continually optimize user personas.However, existing methods—whether regenerating personas or incrementally extending them with new behaviors—often fail to achieve sustained improvements in persona quality or future behavior prediction accuracy. To address this, we propose DEEPER, a novel approach for dynamic persona modeling that enables continual persona optimization. Specifically, we enhance the model’s direction-search capability through an iterative reinforcement learning framework, allowing it to automatically identify effective update directions and optimize personas using discrepancies between user behaviors and model predictions.Extensive experiments on dynamic persona modeling involving 4,800 users across 10 domains highlight ’s superior persona optimization capabilities, delivering an impressive 32.2% average reduction in user behavior prediction error over four update rounds—outperforming the best baseline by a remarkable 22.92%.
2024
Retrieval-style In-context Learning for Few-shot Hierarchical Text Classification
Huiyao Chen | Yu Zhao | Zulong Chen | Mengjia Wang | Liangyue Li | Meishan Zhang | Min Zhang
Transactions of the Association for Computational Linguistics, Volume 12
Huiyao Chen | Yu Zhao | Zulong Chen | Mengjia Wang | Liangyue Li | Meishan Zhang | Min Zhang
Transactions of the Association for Computational Linguistics, Volume 12
Hierarchical text classification (HTC) is an important task with broad applications, and few-shot HTC has gained increasing interest recently. While in-context learning (ICL) with large language models (LLMs) has achieved significant success in few-shot learning, it is not as effective for HTC because of the expansive hierarchical label sets and extremely ambiguous labels. In this work, we introduce the first ICL-based framework with LLM for few-shot HTC. We exploit a retrieval database to identify relevant demonstrations, and an iterative policy to manage multi-layer hierarchical labels. Particularly, we equip the retrieval database with HTC label-aware representations for the input texts, which is achieved by continual training on a pretrained language model with masked language modeling (MLM), layer-wise classification (CLS, specifically for HTC), and a novel divergent contrastive learning (DCL, mainly for adjacent semantically similar labels) objective. Experimental results on three benchmark datasets demonstrate superior performance of our method, and we can achieve state-of-the-art results in few-shot HTC.
Mixed Distillation Helps Smaller Language Models Reason Better
Chenglin Li | Qianglong Chen | Liangyue Li | Caiyu Wang | Feng Tao | Yicheng Li | Zulong Chen | Yin Zhang
Findings of the Association for Computational Linguistics: EMNLP 2024
Chenglin Li | Qianglong Chen | Liangyue Li | Caiyu Wang | Feng Tao | Yicheng Li | Zulong Chen | Yin Zhang
Findings of the Association for Computational Linguistics: EMNLP 2024
As large language models (LLMs) have demonstrated impressive multiple step-by-step reasoning capabilities in recent natural language processing (NLP) reasoning tasks, many studies are interested in distilling reasoning abilities into smaller language models (SLMs) via fine-tuning. Previous distillation methods usually utilize the capabilities of LLMs to generate chain-of-thought (CoT) samples to teach SLMs. However, this distillation approach performs poorly in certain scenarios due to the limitations of CoT. In this work, we introduce a novel Mixed Distillation (MD) framework, distilling multiple step-by-step reasoning abilities into SLMs. First, we leverage LLMs to generate multiple step-by-step reasoning rationales by sampling automatically. Then, we create high-quality, well-balanced mixed thought data and design a novel multi-task loss to help SLMs better learn and adaptively activate multiple step-by-step reasoning. Our extensive experiments demonstrate that MD enhances both single-path (using either CoT or PoT) and multi-path (using both CoT and PoT) reasoning abilities of SLMs during inference across reasoning tasks. Notably, a single model generated by MD exceeds the comprehensive performance of an ensemble of two individual CoT and PoT distilled models. Mistral-7B using MD can achieve remarkable improvements of 87.5%, 74.0% and 77.1% on SVAMP, GSM8K and ASDIV, respectively, outperforming the teacher model, GPT-3.5-Turbo. We hope our work provides insight into SLMs’ multiple step-by-step reasoning abilities.
AutoScraper: A Progressive Understanding Web Agent for Web Scraper Generation
Wenhao Huang | Zhouhong Gu | Chenghao Peng | Jiaqing Liang | Zhixu Li | Yanghua Xiao | Liqian Wen | Zulong Chen
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Wenhao Huang | Zhouhong Gu | Chenghao Peng | Jiaqing Liang | Zhixu Li | Yanghua Xiao | Liqian Wen | Zulong Chen
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Web scraping is a powerful technique that extracts data from websites, enabling automated data collection, enhancing data analysis capabilities, and minimizing manual data entry efforts. Existing methods, wrappers-based methods suffer from limited adaptability and scalability when faced with a new website, while language agents, empowered by large language models (LLMs), exhibit poor reusability in diverse web environments. In this work, we introduce the paradigm of generating web scrapers with LLMs and propose AutoScraper, a two-stage framework that can handle diverse and changing web environments more efficiently. AutoScraper leverages the hierarchical structure of HTML and similarity across different web pages for generating web scrapers. Besides, we propose a new executability metric for better measuring the performance of web scraper generation tasks. We conduct comprehensive experiments with multiple LLMs and demonstrate the effectiveness of our framework. Our work is now open-source.
Search
Fix author
Co-authors
- Liangyue Li 4
- Zhenghao Liu (刘正皓) 3
- Maosong Sun (孙茂松) 3
- Zeyi Wen 3
- Yanghua Xiao 3
- Ying Zhou 3
- Jiangjie Chen 2
- Minping Chen 2
- Yu Gu (谷峪) 2
- Jiaqing Liang 2
- Shuo Wang 2
- Bing Xu 2
- Chuanfei Xu 2
- Yukun Yan (闫宇坤) 2
- Ge Yu (于戈) 2
- Siyu Yuan 2
- Aili Chen 1
- Enhong Chen 1
- Huiyao Chen 1
- Qianglong Chen 1
- Chengyu Du 1
- Hongchao Gu 1
- Zhouhong Gu 1
- Pengcheng Huang 1
- Wenhao Huang 1
- Xinhao Huang 1
- Yifan Ji 1
- Chenglin Li 1
- Jihang Li 1
- Xinze Li 1
- Yicheng Li 1
- Zhixu Li 1
- Defu Lian 1
- Junyang Lin 1
- Suyu Liu 1
- Tianming Liu 1
- Hang Lv 1
- Chenghao Peng 1
- Zhibo Ren 1
- Feng Tao 1
- Zui Tao 1
- Caiyu Wang 1
- Hao Wang 1
- Mengjia Wang 1
- Xintao Wang 1
- Liqian Wen 1
- Zhuoyang Wu 1
- Tong Xiao (肖桐) 1
- Xiaoqingdong 1
- Jinghan Xu 1
- Rui Xu 1
- Zhipeng Xu 1
- Ruiqing Yang 1
- ZhiBo Yang 1
- Yipeng Yu 1
- Xinfeng Yuan 1
- Meishan Zhang 1
- Min Zhang 1
- Qian Zhang 1
- Yikai Zhang 1
- Yin Zhang 1
- Yu Zhao 1