Haitao Li
2026
ReStyle-TTS: Relative and Continuous Style Control for Zero-Shot Speech Synthesis
Haitao Li | Chunxiang Jin | Chenglin Li | Wenhao Guan | Zhengxing Huang | Xie Chen
Findings of the Association for Computational Linguistics: ACL 2026
Haitao Li | Chunxiang Jin | Chenglin Li | Wenhao Guan | Zhengxing Huang | Xie Chen
Findings of the Association for Computational Linguistics: ACL 2026
Zero-shot text-to-speech models can clone a speaker’s timbre from a short reference audio, but they also strongly inherit the speaking style present in the reference. As a result, synthesizing speech with a desired style often requires carefully selecting reference audio, which is impractical when only limited or mismatched references are available. While recent controllable TTS methods attempt to address this issue, they typically rely on absolute style targets and discrete textual prompts, and therefore do not support continuous and reference-relative style control. We propose ReStyle-TTS, a framework that enables continuous and reference-relative style control in zero-shot TTS. Our key insight is that effective style control requires first reducing the model’s implicit dependence on reference style before introducing explicit control mechanisms. To this end, we introduce Decoupled Classifier-Free Guidance (DCFG), which independently controls text and reference guidance, reducing reliance on reference style while preserving text fidelity. On top of this, we apply style-specific LoRAs together with Orthogonal LoRA Fusion to enable continuous and disentangled multi-attribute control, and introduce a Timbre Consistency Optimization module to mitigate timbre drift caused by weakened reference guidance. Experiments show that ReStyle-TTS enables user-friendly, continuous, and relative control over pitch, energy, and multiple emotions while maintaining intelligibility and speaker timbre, and performs robustly in challenging mismatched reference–target style scenarios. Code and data are available in supplementary materials.
Chinese Court Simulation with LLM-Based Agents System
Kaiyuan Zhang | Jiaqi Li | Yueyue Wu | Haitao Li | Cheng Luo | Shaokun Zou | Yujia Zhou | Weihang Su | Yiqun Liu | Qingyao Ai
Findings of the Association for Computational Linguistics: ACL 2026
Kaiyuan Zhang | Jiaqi Li | Yueyue Wu | Haitao Li | Cheng Luo | Shaokun Zou | Yujia Zhou | Weihang Su | Yiqun Liu | Qingyao Ai
Findings of the Association for Computational Linguistics: ACL 2026
Mock trial has long served as an important platform for professional legal training and education. Traditional mock trials are difficult to access by the public because they rely on professional tutors and human participants. Fortunately, the rise of large language models (LLMs) provides new opportunities for creating more accessible and scalable court simulations. While promising, existing research ignored the systematic design and procedure evaluation of court simulations, which are critical to the credibility and usage of court simulation in practice. To this end, we propose a novel court simulation paradigm, i.e. SimCourt, based on the real-world procedure structure of Chinese courts, and design a comprehensive evaluation framework focusing on both legal judgment prediction and court procedure analysis. Experiments show that our framework can generate simulated trials that better guide the system in predicting the imprisonment, probation, and fine of each case. Further procedure evaluations show that agents’ responses under our simulation framework even outperform judges and lawyers from the real trials in many aspects. These demonstrate the potential of LLM-based court simulation.
VideoPro: Adaptive Program Reasoning for Long Video Understanding
Chenglin Li | Feng Han | Yikun Wang | Ruilin Li | Shuai Dong | Haowen Hou | Haitao Li | Qianglong Chen | Feng Tao | Jingqi Tong | Yin Zhang | Jiaqi Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Chenglin Li | Feng Han | Yikun Wang | Ruilin Li | Shuai Dong | Haowen Hou | Haitao Li | Qianglong Chen | Feng Tao | Jingqi Tong | Yin Zhang | Jiaqi Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Understanding long videos remains challenging due to the sparsity of visual evidence relevant to a given query. Prior work has explored program-based visual grounding, typically relying on executable programs generated by auxiliary large language models. However, when scaling to long videos, existing approaches face several critical limitations: (1) frame-centric vision modules are often insufficient for long video processing; (2) naively applying program-based reasoning to all queries incurs considerable computational overhead; and (3) errors arising from low-confidence predictions and imperfect program execution are difficult to recover from. To address these challenges, we propose VideoPro, a unified framework that enables VideoLLMs to adaptively reason over long videos and refine their predictions through executable programs. VideoPro first performs adaptive reasoning, dynamically determining whether a query can be resolved directly by the native VideoLLM or requires explicit multi-step program reasoning. For complex queries, the model decomposes the task into executable programs that invoke specialized vision modules for precise temporal and semantic grounding. To further improve robustness, VideoPro incorporates a self-refinement mechanism that leverages execution feedback and confidence signals to correct erroneous executions and refine low-confidence reasoning programs. By tightly integrating adaptive reasoning with self-refinement, VideoPro consistently outperforms prior methods across multiple long-video understanding benchmarks, yielding an average 6.7% improvement for Qwen3-VL-8B.
Beyond Experience Retrieval: Learning to Generate Utility-Optimized Structured Experience for Frozen LLMs
Xuancheng Li | Haitao Li | Yujia Zhou | Yiqun Liu | Qingyao Ai
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Xuancheng Li | Haitao Li | Yujia Zhou | Yiqun Liu | Qingyao Ai
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language models (LLMs) are largely static and often redo reasoning or repeat mistakes. Prior experience reuse typically relies on external retrieval, which is similarity-based, can introduce noise, and adds latency. We introduce SEAM (Structured Experience Adapter Module), a lightweight, executor-specific plug-in that stores experience in its parameters and generates a structured, instance-tailored experience entry in a single forward pass to guide a frozen LLM executor. SEAM is trained for utility via executor rollouts and GRPO while keeping the executor frozen, and can be further improved with logged-success SFT after deployment. Experiments on mathematical reasoning benchmarks show consistent accuracy gains across executors with low overhead. Extensive ablation and analysis further elucidate the mechanisms underlying SEAM’s effectiveness and robustness.[We release our code at <https://anonymous.4open.science/r/SEAM>.]
2025
SelfRACG: Enabling LLMs to Self-Express and Retrieve for Code Generation
Qian Dong | Jia Chen | Qingyao Ai | Hongning Wang | Haitao Li | Yiwu | Yao Hu | Yiqun Liu | Shaoping Ma
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Qian Dong | Jia Chen | Qingyao Ai | Hongning Wang | Haitao Li | Yiwu | Yao Hu | Yiqun Liu | Shaoping Ma
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Existing retrieval-augmented code generation (RACG) methods typically use an external retrieval module to fetch semantically similar code snippets used for generating subsequent fragments. However, even for consecutive code fragments, the content often diverges due to logical progression, resulting in a content gap. This gap undermines the performance of current RACG methods, as external retrieval modules based on content matching fail to infer the specific information need of LLMs to generate the next code fragment. Therefore, we propose SelfRACG, a novel paradigm that enables large language models (LLMs) to Self-express their information needs to enhance RACG. Specifically, SelfRACG includes an information need expression module and a two-stage information need-guided training strategy, which encourages LLMs to express their information need. Extensive experiments demonstrate that SelfRACG can retrieve external knowledge that better aligns with the LLM’s own information needs, resulting in superior generation performance compared to vanilla RACG. Moreover, both the training and deployment costs for retrieval in our framework are much lower than those of the strongest retrieval model.
CalibraEval: Calibrating Prediction Distribution to Mitigate Selection Bias in LLMs-as-Judges
Haitao Li | Junjie Chen | Qingyao Ai | Zhumin Chu | Yujia Zhou | Qian Dong | Yiqun Liu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Haitao Li | Junjie Chen | Qingyao Ai | Zhumin Chu | Yujia Zhou | Qian Dong | Yiqun Liu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The use of large language models (LLMs) as automated evaluation tools to assess the quality of generated natural language, known as ”LLMs-as-Judges”, has demonstrated promising capabilities and is rapidly gaining widespread attention. However, when applied to pairwise comparisons of candidate responses, LLM-based evaluators often exhibit selection bias. Specifically, their judgments may become inconsistent when the option positions or ID tokens are swapped, compromising the effectiveness and fairness of the evaluation result. To address this challenge, we introduce CalibraEval, a novel label-free method for mitigating selection bias during inference. Specifically, CalibraEval reformulates debiasing as an optimization task aimed at adjusting observed prediction distributions to align with unbiased prediction distributions. To solve this optimization problem, we propose a non-parametric order-preserving algorithm (NOA). This algorithm leverages the partial order relationships between model prediction distributions, thereby eliminating the need for explicit labels and precise mathematical function modeling. Empirical evaluations of LLMs in multiple representative benchmarks demonstrate that CalibraEval effectively mitigates selection bias and improves performance compared to existing debiasing methods. This work marks a step toward building more robust and unbiased automated evaluation frameworks, paving the way for improved reliability in AI-driven assessments. The code can be found at https://github.com/CSHaitao/CalibraEval.
LegalAgentBench: Evaluating LLM Agents in Legal Domain
Haitao Li | Junjie Chen | Jingli Yang | Qingyao Ai | Wei Jia | Youfeng Liu | Kai Lin | Yueyue Wu | Guozhi Yuan | Yiran Hu | Wuyue Wang | Yiqun Liu | Minlie Huang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Haitao Li | Junjie Chen | Jingli Yang | Qingyao Ai | Wei Jia | Youfeng Liu | Kai Lin | Yueyue Wu | Guozhi Yuan | Yiran Hu | Wuyue Wang | Yiqun Liu | Minlie Huang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
With the increasing intelligence and autonomy of LLM Agents, their potential applications in the legal domain are becoming increasingly apparent. However, existing general-domain benchmarks are unable to fully capture the complexity and subtle nuances inherent in real-world judicial cognition and decision-making. Therefore, we propose LegalAgentBench, a comprehensive benchmark specifically designed to evaluate LLM Agents in the Chinese legal domain. LegalAgentBench includes 17 corpora from real-world legal scenarios and provides 37 tools for interacting with external knowledge. To cover tasks of varying difficulty and types, we designed a scalable task construction process that enables a more precise evaluation of performance in both tool utilization and reasoning. Moreover, Beyond assessing performance through the success rate of final outcomes, LegalAgentBench incorporates keyword analysis during intermediate processes to calculate progress rates, facilitating a more fine-grained evaluation. We evaluated eight popular LLMs, highlighting the strengths, limitations, and potential areas for improvement of existing models and methods. LegalAgentBench sets a new benchmark for the practical application of LLMs in the legal domain, with its code and data available at https://github.com/CSHaitao/LegalAgentBench.
Search
Fix author
Co-authors
- Qingyao Ai 5
- Yiqun Liu 5
- Yujia Zhou 3
- Junjie Chen 2
- Qian Dong 2
- Chenglin Li 2
- Yueyue Wu 2
- Jia Chen 1
- Qianglong Chen 1
- Xie Chen 1
- Zhumin Chu 1
- Shuai Dong 1
- Wenhao Guan 1
- Yiran HU 1
- Feng Han 1
- Haowen Hou 1
- Yao Hu 1
- Minlie Huang 1
- Zhengxing Huang 1
- Wei Jia 1
- Chunxiang Jin 1
- Jiaqi Li 1
- Ruilin Li 1
- Xuancheng Li 1
- Kai Lin 1
- Youfeng Liu 1
- Cheng Luo 1
- Shaoping Ma 1
- Weihang Su 1
- Feng Tao 1
- Jingqi Tong 1
- Hongning Wang 1
- Jiaqi Wang 1
- Wuyue Wang 1
- Yikun Wang 1
- Jingli Yang 1
- Yiwu 1
- Guozhi Yuan 1
- Kaiyuan Zhang 1
- Yin Zhang 1
- Shaokun Zou 1