Huidan Xu
Also published as: 会丹 徐
2026
Entropy-Aware Reshaping of Reinforcement Signals for Multi-Answer Reasoning
Zhi Li | Huidan Xu | Zhen Hu | Yali Du | Ying Liu
Findings of the Association for Computational Linguistics: ACL 2026
Zhi Li | Huidan Xu | Zhen Hu | Yali Du | Ying Liu
Findings of the Association for Computational Linguistics: ACL 2026
Reinforcement learning with verifiable rewards (RLVR) is a standard post-training paradigm for large language models (LLMs), typically relying on group-wise reward and advantage normalization for stability. In set-valued multi-answer tasks, where multiple outputs may be simultaneously correct, this normalization can over-amplify a small number of early high-reward samples, suppressing learning signals from other valid generations and leading to overly concentrated updates. We propose Entropy-Aware Reshaping of Reinforcement Signals (EARS), a framework that reshapes how learning signals are normalized and aggregated. EARS uses token-level predictive entropy as an uncertainty cue to compute entropy-weighted reward statistics for advantage normalization, encouraging broader exploration and more balanced learning-signal allocation early in training. An adaptive decay schedule then anneals uncertainty-aware reweighting back to standard group normalization to ensure stable convergence. EARS further incorporates a correctness-gated multi-head process reward that provides auxiliary supervision on reasoning traces while remaining aligned with verifiable correctness. Experiments on MCTACO and MMLU-Multi using Qwen2.5-7B and Llama-3.1-8B-Instruct demonstrate consistent improvements in exact-set accuracy, training stability, and cross-dataset transfer performance on set-valued multi-answer reasoning.
2021
先秦词网构建及梵汉对比研究(The Construction of Pre-Qin Ancient Chinese WordNet and Cross Language Comparative Study between Ancient Sanskrit WordNet and Pre-Qin Ancient Chinese WordNet)
Xuehui Lu (卢雪晖) | Huidan Xu (徐会丹) | Siyu Chen (陈思瑜) | Bin Li (李斌)
Proceedings of the 20th Chinese National Conference on Computational Linguistics
Xuehui Lu (卢雪晖) | Huidan Xu (徐会丹) | Siyu Chen (陈思瑜) | Bin Li (李斌)
Proceedings of the 20th Chinese National Conference on Computational Linguistics
先秦汉语在汉语史研究上具有重要地位,然而以往的研究始终没有形成结构化的先秦词汇资源,难以满足古汉语信息处理和跨语言对比的研究需要。国际上以英文词网(WordNet)的义类架构为基础,已经建立了数十种语言的词网,已经成为多语言自然语言处理和跨语言对比的基础资源。本文综述了国内外各种词网的构建情况,特别是古代语言的词网和汉语词网,然后详细介绍了先秦词网的构建和校正过程,构建起了涵盖43591个词语、61227个义项、17975个义类的先秦汉语词网。本文还通过与古梵语词网的跨语言对比,尝试分析这两种古老语言在词汇上的共性和差异,初步验证先秦词网的有效性。