Chao Ye
2026
Everyone is unique: Towards Behaviorally Heterogeneous Negotiation Dialogue Systems for Debt Collection
Yuhang Yang | Kai Tang | Chao Ye | Haobo Wang | Qiqi Luo | Jin Guang Zheng | Zhixin Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yuhang Yang | Kai Tang | Chao Ye | Haobo Wang | Qiqi Luo | Jin Guang Zheng | Zhixin Zhang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Debt collection is a critical negotiation task in the financial industry, with strong practical relevance and exceptional academic value as a behaviorally rich, high-stakes testbed for human-centered dialogue systems. While large language models (LLMs) have shown promise in dialogue and negotiation, effectively evaluating their performance in this complex scenarios remains a major challenge: existing benchmarks uniformly assume users to be static, rational agents with fixed preferences, failing to capture the rich behavioral heterogeneity inherent in real-world debt collection. To bridge this gap, we propose DebtBench, the first public persona-enriched debt collection benchmark, that highlights behavioral heterogeneity in negotiation. Moreover, we develop DebtGPT, a debt collection agent trained to jointly optimize financial recovery and interaction experience. Our experimental results, using 16 state-of-the-art LLMs, find that most existing models struggle in this complex but realistic scenarios, whereas DebtGPT outperforms all open-source baselines and achieves performance on par with GPT-4o. The code and data are available at https://github.com/yyuhhhh13/DebtNegotiation.
A Novel Matching Paradigm: Unified Generative and Discriminative LLM with Prompt Compression for Relevance Learning
Guoliang Zhao | Zixin Cui | Chao Ye | Dengwu He | Fei Huang | Yubo Liu | Shuanglong Li | Tzungren Kuo | Bin Ding | Shuang Zhang | KunhongZhu | Zhi Guo | Liu Lin
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Guoliang Zhao | Zixin Cui | Chao Ye | Dengwu He | Fei Huang | Yubo Liu | Shuanglong Li | Tzungren Kuo | Bin Ding | Shuang Zhang | KunhongZhu | Zhi Guo | Liu Lin
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
The matching paradigm is fundamental to large-scale information retrieval and is widely used in industrial search and advertising systems. Existing approaches employ Large Language Models (LLMs) primarily as feature extractors, underutilizing their full modeling capabilities. To address this limitation, we propose a novel matching paradigm, termed the Unified Generative and Discriminative LLM (UGD). It integrates two-tower, single-tower, and generative tasks within a unified LLM framework via attention-mask partitioning, enabling generative tasks to serve as auxiliary supervision for discriminative learning and facilitating distillation from single-tower to two-tower architectures through a multi-task fine-tuning mechanism. To satisfy online latency constraints, we further introduce a self-distillation variant of UGD with a KMeans-enhanced linearized RQVAE for prompt compression and quantization. This design compresses and quantizes landing-page documents during inference, improving serving efficiency and reducing storage overhead. Extensive experiments show that UGD achieves superior performance and strong practical value. The framework has been deployed in an industrial search engine serving hundreds of millions of users and hundreds of thousands of advertisers, significantly enhancing search experience. Open access upon publication.
2025
Jailbreaking Prompt Attack: A Controllable Adversarial Attack against Diffusion Models
Jiachen Ma | Yijiang Li | Zhiqing Xiao | Anda Cao | Jie Zhang | Chao Ye | Junbo Zhao
Findings of the Association for Computational Linguistics: NAACL 2025
Jiachen Ma | Yijiang Li | Zhiqing Xiao | Anda Cao | Jie Zhang | Chao Ye | Junbo Zhao
Findings of the Association for Computational Linguistics: NAACL 2025
Text-to-image (T2I) models can be maliciously used to generate harmful content such as sexually explicit, unfaithful, and misleading or Not-Safe-for-Work (NSFW) images. Previous attacks largely depend on the availability of the diffusion model or involve a lengthy optimization process. In this work, we investigate a more practical and universal attack that does not require the presence of a target model and demonstrate that the high-dimensional text embedding space inherently contains NSFW concepts that can be exploited to generate harmful images. We present the Jailbreaking Prompt Attack (JPA). JPA first searches for the target malicious concepts in the text embedding space using a group of antonyms generated by ChatGPT. Subsequently, a prefix prompt is optimized in the discrete vocabulary space to align malicious concepts semantically in the text embedding space.We further introduce a soft assignment with gradient masking technique that allows us to perform gradient ascent in the discrete vocabulary space.We perform extensive experiments with open-sourced T2I models, e.g. stable-diffusion-v1-4 and closed-sourced online services, e.g. DALL·E 2 and Midjourney with black-box safety checkers. Results show that (1) JPA bypasses both text and image safety checkers, (2) while preserving high semantic alignment with the target prompt. (3) JPA demonstrates a much faster speed than previous methods and can be executed in a fully automated manner. These merits render it a valuable tool for robustness evaluation in future text-to-image generation research.
LongTableBench: Benchmarking Long-Context Table Reasoning across Real-World Formats and Domains
Liyao Li | Jiaming Tian | Hao Chen | Wentao Ye | Chao Ye | Haobo Wang | Ningtao Wang | Xing Fu | Gang Chen | Junbo Zhao
Findings of the Association for Computational Linguistics: EMNLP 2025
Liyao Li | Jiaming Tian | Hao Chen | Wentao Ye | Chao Ye | Haobo Wang | Ningtao Wang | Xing Fu | Gang Chen | Junbo Zhao
Findings of the Association for Computational Linguistics: EMNLP 2025
We introduce LongTableBench, a benchmark for evaluating long-context reasoning over semi-structured tables across diverse formats, tasks, and domains. It comprises 5,950 QA instances spanning 7 table formats (e.g., Markdown, HTML, SQL), 18 domains, and input lengths up to 128K tokens, including multi-turn and multi-table settings. To ensure data quality, we combine symbolic supervision, cross-model validation, and human review. Evaluating 52 LLMs—including general-purpose, table-specific, and reasoning-enhanced models—reveals that only the strongest models maintain robust performance under increasing context lengths and format diversity. We further show that end-to-end models outperform compression-based approaches, especially on tasks requiring semantic integration. LongTableBench provides a rigorous, scalable testbed for advancing long-context tabular understanding and highlights key limitations in current LLMs’ structural and reasoning capabilities. The code and data are available at https://github.com/liyaooi/LongTableBench.
RealHiTBench: A Comprehensive Realistic Hierarchical Table Benchmark for Evaluating LLM-Based Table Analysis
Pengzuo Wu | Yuhang Yang | Guangcheng Zhu | Chao Ye | Hong Gu | Xu Lu | Ruixuan Xiao | Bowen Bao | Yijing He | Liangyu Zha | Wentao Ye | Junbo Zhao | Haobo Wang
Findings of the Association for Computational Linguistics: ACL 2025
Pengzuo Wu | Yuhang Yang | Guangcheng Zhu | Chao Ye | Hong Gu | Xu Lu | Ruixuan Xiao | Bowen Bao | Yijing He | Liangyu Zha | Wentao Ye | Junbo Zhao | Haobo Wang
Findings of the Association for Computational Linguistics: ACL 2025
With the rapid advancement of Large Language Models (LLMs), there is an increasing need for challenging benchmarks to evaluate their capabilities in handling complex tabular data. However, existing benchmarks are either based on outdated data setups or focus solely on simple, flat table structures. In this paper, we introduce **RealHiTBench**, a comprehensive benchmark designed to evaluate the performance of both LLMs and Multimodal LLMs (MLLMs) across a variety of input formats for complex tabular data, including LaTeX, HTML, and PNG. RealHiTBench also includes a diverse collection of tables with intricate structures, spanning a wide range of task types. Our experimental results, using **25** state-of-the-art LLMs, demonstrate that RealHiTBench is indeed a challenging benchmark. Moreover, we also develop TreeThinker, a tree-based agent that organizes hierarchical headers into a tree structure for enhanced tabular reasoning, validating the importance of improving LLMs’ perception of table hierarchies. We hope that our work will inspire further research on tabular data reasoning and the development of more robust models. The code and data are available at https://github.com/cspzyy/RealHiTBench.
Search
Fix author
Co-authors
- Haobo Wang 3
- Junbo Zhao 3
- Yuhang Yang 2
- Wentao Ye 2
- Bowen Bao 1
- Anda Cao 1
- Gang Chen 1
- Hao Chen 1
- Zixin Cui 1
- Bin Ding 1
- Xing Fu 1
- Hong Gu 1
- Zhi Guo 1
- Dengwu He 1
- Yijing He 1
- Fei Huang 1
- KunhongZhu 1
- Tzungren Kuo 1
- Liyao Li 1
- Shuanglong Li 1
- Yijiang Li 1
- Liu Lin 1
- Yubo Liu 1
- Xu Lu 1
- Qiqi Luo 1
- Jiachen Ma 1
- Kai Tang 1
- Jiaming Tian 1
- Ningtao Wang 1
- Pengzuo Wu 1
- Ruixuan Xiao 1
- Zhiqing Xiao 1
- Liangyu Zha 1
- Jie Zhang 1
- Shuang Zhang 1
- Zhixin Zhang 1
- Guoliang Zhao 1
- Jin Guang Zheng 1
- Guangcheng Zhu 1