Ziqi Zhao
2026
Reinforced Efficient Reasoning via Semantically Diverse Exploration
Ziqi Zhao | Zhaochun Ren | Jiahong Zou | Liu Yang | Zhiwei Xu | Xuri Ge | Zhumin Chen | Xinyu Ma | Daiting Shi | Shuaiqiang Wang | Dawei Yin | Xin Xin
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Ziqi Zhao | Zhaochun Ren | Jiahong Zou | Liu Yang | Zhiwei Xu | Xuri Ge | Zhumin Chen | Xinyu Ma | Daiting Shi | Shuaiqiang Wang | Dawei Yin | Xin Xin
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Reinforcement learning with verifiable rewards (RLVR) has proven effective in enhancing the reasoning of large language models (LLMs). Monte Carlo Tree Search (MCTS)-based extensions improve upon vanilla RLVR (e.g., GRPO) by providing tree-based reasoning rollouts that enable fine-grained and segment-level credit assignment. However, existing methods still suffer from limited exploration diversity and inefficient reasoning. To address the above challenges, we propose reinforced efficient reasoning via semantically diverse explorations, i.e., ROSE, for LLMs. To encourage more diverse reasoning exploration, our method incorporates a semantic-entropy-based branching strategy and an 𝜀-exploration mechanism. The former operates on already sampled reasoning rollouts to capture semantic uncertainty and select branching points with high semantic divergence to generate new successive reasoning paths, whereas the latter stochastically initiates reasoning rollouts from the root, preventing the search process from becoming overly local. To improve efficiency, we design a length-aware segment-level advantage estimator that rewards concise and correct reasoning while penalizing unnecessarily long reasoning chains. Extensive experiments on various mathematical reasoning benchmarks with Qwen and Llama models validate the effectiveness and efficiency of ROSE. Codes are available at https://github.com/ZiqiZhao1/ROSE-rl.
2023
Generalizing Few-Shot Named Entity Recognizers to Unseen Domains with Type-Related Features
Zihan Wang | Ziqi Zhao | Zhumin Chen | Pengjie Ren | Maarten de Rijke | Zhaochun Ren
Findings of the Association for Computational Linguistics: EMNLP 2023
Zihan Wang | Ziqi Zhao | Zhumin Chen | Pengjie Ren | Maarten de Rijke | Zhaochun Ren
Findings of the Association for Computational Linguistics: EMNLP 2023
Few-shot named entity recognition (NER) has shown remarkable progress in identifying entities in low-resource domains. However, few-shot NER methods still struggle with out-of-domain (OOD) examples due to their reliance on manual labeling for the target domain. To address this limitation, recent studies enable generalization to an unseen target domain with only a few labeled examples using data augmentation techniques. Two important challenges remain: First, augmentation is limited to the training data, resulting in minimal overlap between the generated data and OOD examples. Second, knowledge transfer is implicit and insufficient, severely hindering model generalizability and the integration of knowledge from the source domain. In this paper, we propose a framework, prompt learning with type-related features (PLTR), to address these challenges. To identify useful knowledge in the source domain and enhance knowledge transfer, PLTR automatically extracts entity type-related features (TRFs) based on mutual information criteria. To bridge the gap between training and OOD data, PLTR generates a unique prompt for each unseen example by selecting relevant TRFs. We show that PLTR achieves significant performance improvements on in-domain and cross-domain datasets. The use of PLTR facilitates model adaptation and increases representation similarities between the source and unseen domains.