Guotong Geng
2026
When Rules Learn: A Self-Evolving Agent for Legal Case Retrieval
Mingxu Tao | Jiawei Hu | Xian Zhou | Wenpeng Hu | Jiajun Cheng | Yunbo Cao | Zhunchen Luo | Guotong Geng
Findings of the Association for Computational Linguistics: ACL 2026
Mingxu Tao | Jiawei Hu | Xian Zhou | Wenpeng Hu | Jiajun Cheng | Yunbo Cao | Zhunchen Luo | Guotong Geng
Findings of the Association for Computational Linguistics: ACL 2026
Legal case retrieval remains challenging due to the complexity of legal language and the need for precise lexical alignment between queries and relevant cases. Although dense retrieval models have achieved notable progress, empirical studies show that BM25 continues to serve as a strong baseline in this domain. It motivates us to propose a self-evolving framework for rule-driven query rewriting that enhances BM25 without any parameter training. The framework equips an LLM-based agent with an automatic evaluation environment, enabling it to iteratively create rewriting rules, plan validation experiments over rule combinations, and eliminate ineffective rules based on historical feedbacks. We evaluate our method on the Chinese legal case retrieval benchmark LeCaRD-v2. Experimental results demonstrate that the proposed framework outperforms non-evolutionary baselines, including human-designed rules and greedy rule selection, particularly when powered by a high-capacity core LLM. We also conduct detailed analyses to investigate the mechanisms underlying self-evolution. Our findings reveal that LLM’s capabilities to leverage previous experimental results and its intrinsic knowledge of rule elimination play critical roles in refining the rule set via self-evolution.
IS-CoT: Breaking the Long-form Generation Collapse via Interleaved Structural Thinking
Zechen Sun | Yuyang Sun | Zecheng Tang | Juntao Li | Wenpeng Hu | Wenliang Chen | Zhunchen Luo | Guotong Geng | Min Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zechen Sun | Yuyang Sun | Zecheng Tang | Juntao Li | Wenpeng Hu | Wenliang Chen | Zhunchen Luo | Guotong Geng | Min Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Generating coherent and controllable long-form content remains a persistent challenge for Large Language Models (LLMs). While reasoning-enhanced models have demonstrated success in logic-intensive domains, our evaluation reveals that they suffer from a severe length collapse in open-ended writing, where performance degrades sharply as target lengths exceed 2,000 words. We attribute this failure to the limitation of static hierarchical planning, which struggles to provide dynamic guidance over extended contexts. To bridge this gap, we introduce the **Interleaved Structural Chain-of-Thought (IS-CoT)** framework. Unlike external agentic workflows, **IS-CoT** embeds a dynamic Plan-Write-Reflect cycle into the generation process, enabling continuous strategy adaptation and global alignment without additional assistance. Based on this framework, we construct a high-quality dataset of interleaved reasoning traces via a multi-teacher pipeline and train **IS-Writer-8B**. Experiments demonstrate that IS-Writer-8B achieves state-of-the-art performance on challenging long-form benchmarks (e.g., +3.08 vs. DeepSeek-V3.2 on LongBench-Write), exhibiting robust length compliance and coherence competitive with significantly larger proprietary models.
SHARP: Self-adaptive Harmful Category-aware Prompt Generation for Black-box Jailbreaking
Yingjie Xue | Xingyou Xia | Jun Zhang | Yunbo Cao | Dengpan Ye | Guotong Geng | Fei Li
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yingjie Xue | Xingyou Xia | Jun Zhang | Yunbo Cao | Dengpan Ye | Guotong Geng | Fei Li
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Models (LLMs) have been widely applied in various domains such as education and healthcare, making safety assurance crucial. Jailbreak attacks, a method used in red-teaming, can help evaluate and improve the defensive strategies of LLMs. However, existing jailbreak methods often overlook the semantic differences across categories of harmful questions, leading to inconsistent success rates and reduced overall attack effectiveness. We propose the first category-aware jailbreak framework, SHARP, which incorporates the semantic category of harmful questions into prompt generation. Trained on a verified jailbreak dataset, SHARP enables the model to learn category-specific semantic features and adaptively generate prompts that bypass safety mechanisms. The method combines two-stage LoRA fine-tuning, and DPO-based reinforcement learning to optimize both attack success and category alignment. Experiments show that SHARP significantly improves attack success rates and achieves better cross-category robustness compared to the state-of-the-art (SOTA) baselines, providing an efficient and scalable tool for evaluating LLM safety.
DisCal: Distribution-Aware Calibration for Mathematical Reasoning Under Character-Level Noisy Inputs
Bo Zhang | Jiawei Zhang | Cong Gao | Bingxu Han | Minghao Hu | Jun Zhang | Yunbo Cao | Zhunchen Luo | Wen Yao | Guotong Geng | Zhong Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Bo Zhang | Jiawei Zhang | Cong Gao | Bingxu Han | Minghao Hu | Jun Zhang | Yunbo Cao | Zhunchen Luo | Wen Yao | Guotong Geng | Zhong Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Although large reasoning models (LRMs) exhibit exceptional mathematical reasoning capabilities on clean inputs, their reasoning accuracy drops substantially in the presence of character-level noise such as typographical errors. Critically, their confidence estimates fail to reflect the corresponding decline in reasoning accuracy. While confidence calibration offers a principled solution, existing methods predominantly target clean inputs, leaving noisy scenarios largely unexplored. To address this gap, we propose DisCal (Distribution-aware Calibration), a confidence calibration framework for character-level noisy inputs. DisCal extracts uncertainty signals from both the empirical answer distribution and the model’s predictive distribution, and integrates them via a learned calibrator to produce well-calibrated confidence. Experiments across multiple mathematical reasoning benchmarks demonstrate that DisCal consistently outperforms existing calibration methods under noisy inputs, reducing Expected Calibration Error (ECE) by up to 39.21% and improving Area Under the Receiver Operating Characteristic Curve (AUROC) by up to 31.44%.
2025
Dynamic Evil Score-Guided Decoding: An Efficient Decoding Framework For Red-Team Model
Cong Gao | Bo Zhang | Linkang Yang | Minghao Hu | Zhunchen Luo | Xiaoying Bai | Guotong Geng | Jun Zhang | Yunhua Xue
Findings of the Association for Computational Linguistics: ACL 2025
Cong Gao | Bo Zhang | Linkang Yang | Minghao Hu | Zhunchen Luo | Xiaoying Bai | Guotong Geng | Jun Zhang | Yunhua Xue
Findings of the Association for Computational Linguistics: ACL 2025
Large language models (LLMs) have achieved significant advances but can potentially generate harmful content such as social biases, extremism, and misinformation. Red teaming is a promising approach to enhance model safety by creating adversarial prompts to test and improve model robustness. However, existing red-teaming methods often require expensive fine-tuning, especially for large LLMs. We propose the Dynamic Evil Score-Guided Decoding framework (DESGD), an efficient red-teaming method that does not increase computational cost with the target model size. DESGD introduces the concept of an ‘evil score’ to dynamically evaluate the potential of tokens to contribute to harmful outputs during decoding. This framework constructs a small unsafe model using an adversarial dataset and adjusts the logits vector of the target model based on the evil score. Experiments show that DESGD achieves an ASR of 92.83% on the Llama-3.2-3B-Instruct model, compared to 83.48% with adversarial fine-tuning while using less computational resources. Similarly, on the Qwen2.5-3B-Instruct model, DESGD reaches an ASR of 88.62%, outperforming adversarial fine-tuning (77.56%).
SafeConf: A Confidence-Calibrated Safety Self-Evaluation Method for Large Language Models
Bo Zhang | Cong Gao | Linkang Yang | Bingxu Han | Minghao Hu | Zhunchen Luo | Guotong Geng | Xiaoying Bai | Jun Zhang | Wen Yao | Zhong Wang
Findings of the Association for Computational Linguistics: EMNLP 2025
Bo Zhang | Cong Gao | Linkang Yang | Bingxu Han | Minghao Hu | Zhunchen Luo | Guotong Geng | Xiaoying Bai | Jun Zhang | Wen Yao | Zhong Wang
Findings of the Association for Computational Linguistics: EMNLP 2025
Large language models (LLMs) have achieved groundbreaking progress in Natural Language Processing (NLP). Despite the numerous advantages of LLMs, they also pose significant safety risks. Self-evaluation mechanisms have gained increasing attention as a key safeguard to ensure safe and controllable content generation. However, LLMs often exhibit overconfidence, which seriously compromises the accuracy of safety self-evaluation. To address this challenge, we propose SafeConf, a method to enhance the safety self-evaluation capability of LLMs through confidence calibration. The method performs semantic mutations on the original safety evaluation questions and adopts a self-consistency strategy to quantify confidence based on answer accuracy on the mutated questions. Finally, these confidence scores are used to construct a dataset for fine-tuning. We conducte experiments on both Chinese and English datasets. The results show that SafeConf improves self-evaluation accuracy by an average of 5.86% and 7.79% over the state-of-the-art baseline methods on Qwen2.5-7B-Instruct and Llama3-8B-Instruct models, respectively, without affecting the general capabilities of the models.
Search
Fix author
Co-authors
- Zhunchen Luo 5
- Yunbo Cao 3
- Cong Gao 3
- Xiaoying Bai 2
- Bingxu Han 2
- Minghao Hu 2
- Wenpeng Hu 2
- Zhong Wang 2
- Linkang Yang 2
- Wen Yao 2
- Bo Zhang 2
- Jun Zhang 2
- Jun Zhang 2
- Wenliang Chen (陈文亮) 1
- Jiajun Cheng 1
- Jiawei Hu 1
- Minghao Hu 1
- Juntao Li 1
- Fei Li 1
- Zechen Sun 1
- Yuyang Sun 1
- Zecheng Tang (汤泽成) 1
- Mingxu Tao 1
- Xingyou Xia 1
- Yunhua Xue 1
- Yingjie Xue 1
- Dengpan Ye 1
- Min Zhang 1
- Bo Zhang 1
- Jiawei Zhang 1
- Xian Zhou 1