Xuhui Zheng


2025

pdf bib
AHVE-CNER: Aligned Hanzi Visual Encoding Enhance Chinese Named Entity Recognition with Multi-Information
Xuhui Zheng | Zhiyuan Min | Bin Shi | Hao Wang
Proceedings of the 31st International Conference on Computational Linguistics

The integration of multi-modal information, especially the graphic features of Hanzi, is crucial for improving the performance of Chinese Named Entity Recognition (NER) tasks. However, existing glyph-based models frequently neglect the relationship between pictorial elements and radicals. This paper presents AHVE-CNER, a model that integrates multi-source visual and phonetic information of Hanzi, while explicitly aligning pictographic features with their corresponding radicals. We propose the Gated Pangu-𝜋 Cross Transformer to effectively facilitate the integration of these multi-modal representations. By leveraging a multi-source glyph alignment strategy, AHVE-CNER demonstrates an improved capability to capture the visual and semantic nuances of Hanzi for NER tasks. Extensive experiments on benchmark datasets validate that AHVE-CNER achieves superior performance compared to existing multi-modal Chinese NER methods. Additional ablation studies further confirm the effectiveness of our visual alignment module and the fusion approach.

pdf bib
Enhancing Extractive Question Answering in Multiparty Dialogues with Logical Inference Memory Network
Shu Zhou | Rui Zhao | Zhengda Zhou | Haohan Yi | Xuhui Zheng | Hao Wang
Proceedings of the 31st International Conference on Computational Linguistics

Multiparty dialogue question answering (QA) in machine reading comprehension (MRC) is a challenging task due to its complex information flow interactions and logical QA inference. Existing models typically handle such QA tasks by decoupling dialogue information at both speaker and utterance levels. However, few of them consider the logical inference relations in multiparty dialogue QA, leading to suboptimal QA performance. To address this issue, this paper proposes a memory network with logical inference (LIMN) for extractive QA in multiparty dialogues. LIMN introduces an inference module, which is pretrained by incorporating plain QA articles as external knowledge. It generates logical inference-aware representations from latent space for multiparty dialogues. To further model complex interactions among logical dialogue contexts, questions and key-utterance information, a key-utterance-based interaction method is proposed for leverage. Moreover, a multitask learning strategy is adopted for robust MRC. Extensive experiments were conducted on Molweni and FriendsQA benchmarks, which included 25k and 10k questions, respectively. Comparative results showed that LIMN achieves state-of-the-art results on both benchmarks, demonstrating the enhancement of logical QA inference in multiparty dialogue QA tasks.

pdf bib
StepSearch: Igniting LLMs Search Ability via Step-Wise Proximal Policy Optimization
Xuhui Zheng | Kang An | Ziliang Wang | Yuhang Wang | Yichao Wu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Efficient multi-hop reasoning requires Large Language Models (LLMs) based agents to acquire high-value external knowledge iteratively. Previous work has explored reinforcement learning (RL) to train LLMs to perform search-based document retrieval, achieving notable improvements in QA performance, but underperform on complex, multi-hop QA resulting from the sparse rewards from global signal only. To address this gap in existing research, we introduce StepSearch, a framework for search LLMs that trained with step-wise proximal policy optimization method. It consists of richer and more detailed intermediate search rewards and token-level process supervision based on information gain and redundancy penalties to better guide each search step. We constructed a fine-grained question-answering dataset containing sub-question-level search trajectories based on open source datasets through a set of data pipeline method. On standard multi-hop QA benchmarks, it significantly outperforms global-reward baselines, achieving 11.2% and 4.2% absolute improvements for 3B and 7B models over various search with RL baselines using only 19k training data, demonstrating the effectiveness of fine-grained, stepwise supervision in optimizing deep search LLMs. The project is open source at https://github.com/Zillwang/StepSearch