Kai Hu
2026
Jailbreak-Zero: A Path to Pareto Optimal Red Teaming for Large Language Models
Kai Hu | Abhinav Aggarwal | Mehran Khodabandeh | David Zhang | Eric Hsin | Li Chen | Ankit Jain | Matt Fredrikson | Akash Bharadwaj
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Kai Hu | Abhinav Aggarwal | Mehran Khodabandeh | David Zhang | Eric Hsin | Li Chen | Ankit Jain | Matt Fredrikson | Akash Bharadwaj
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
This paper presents a novel Automated Red Teaming (ART) framework that shifts from example-based to policy-based evaluation, addressing critical limitations in scalability and validity. We define harmful content through abstract safety policies rather than specific static examples. We also introduce multiple evaluation objectives: risk coverage, semantic diversity, and fidelity, and discover Pareto trade-offs between them. We propose Jailbreak-Zero, a black-box method capable of both zero-shot generation and fine-tuned exploitation of a victim’s vulnerabilities to achieve Pareto optimality. Unlike prior approaches, it does not require expert-designed strategies/prompts, but still achieves superior, human-readable attacks against open-source and proprietary models (attack success rates of 99.5% against GPT-4o and 96.0% against Claude 3.5), even for unseen safety policies. It retains efficacy even after victim models undergo safety alignment, and exposes controls to navigate Pareto trade-offs without retraining. Lastly, we show that Jailbreak-Zero is the best-performing ART method at a given compute budget. Code is available at: https://github.com/hukkai/jailbreak-zero/ .
MM-Doc-R1: Training Agents for Long Document Visual Question Answering through Multi-turn Reinforcement Learning
Jiahang Lin | Kai Hu | Binghai Wang | Yuhao Zhou | Zhiheng Xi | Honglin Guo | Shichun Liu | Junzhe Wang | Shihan Dou | Enyu Zhou | Hang Yan | Zhenhua Han | Tao Gui | Qi Zhang | Xuanjing Huang
Findings of the Association for Computational Linguistics: ACL 2026
Jiahang Lin | Kai Hu | Binghai Wang | Yuhao Zhou | Zhiheng Xi | Honglin Guo | Shichun Liu | Junzhe Wang | Shihan Dou | Enyu Zhou | Hang Yan | Zhenhua Han | Tao Gui | Qi Zhang | Xuanjing Huang
Findings of the Association for Computational Linguistics: ACL 2026
Conventional Retrieval-Augmented Generation (RAG) systems often struggle with complex multi-hop queries over long documents due to their single-pass retrieval. We introduce **MM-Doc-R1**, a novel framework that employs an agentic, vision-aware workflow to address long document visual question answering through iterative information discovery and synthesis. To incentivize the information seeking capabilities of our agents, we propose **Similarity-based Policy Optimization (SPO)**, addressing baseline estimation bias in existing multi-turn reinforcement learning (RL) algorithms like GRPO. Our core insight is that in multi-turn RL, the more semantically similar two trajectories are, the more accurate their shared baseline estimation becomes. Leveraging this, SPO calculates a more precise baseline by similarity-weighted averaging of rewards across multiple trajectories, unlike GRPO which inappropriately applies the initial state’s baseline to all intermediate states. This provides a more stable and accurate learning signal for our agents, leading to superior training performance that surpasses GRPO. Our experiments on the MMLongbench-Doc benchmark show that **MM-Doc-R1** outperforms previous baselines by **10.4%**. Furthermore, **SPO** demonstrates superior performance over **GRPO**, boosting results by **5.0%** with Qwen3-8B and **6.1%** with Qwen3-4B. These results highlight the effectiveness of our integrated framework and novel training algorithm in advancing the state-of-the-art for complex, long-document visual question answering.
2025
MIG: Automatic Data Selection for Instruction Tuning by Maximizing Information Gain in Semantic Space
Yicheng Chen | Yining Li | Kai Hu | Ma Zerun | HaochenYe HaochenYe | Kai Chen
Findings of the Association for Computational Linguistics: ACL 2025
Yicheng Chen | Yining Li | Kai Hu | Ma Zerun | HaochenYe HaochenYe | Kai Chen
Findings of the Association for Computational Linguistics: ACL 2025
Data quality and diversity are key to the construction of effective instruction-tuning datasets. With the increasing availability of open-source instruction-tuning datasets, it is advantageous to automatically select high-quality and diverse subsets from a vast amount of data. Existing methods typically prioritize instance quality and use heuristic rules to maintain diversity. However, this absence of a comprehensive view of the entire collection often leads to suboptimal results. Moreover, heuristic rules generally focus on distance or clustering within the embedding space, which fails to accurately capture the intent of complex instructions in the semantic space. To bridge this gap, we propose a unified method for quantifying the information content of datasets. This method models the semantic space by constructing a label graph and quantifies diversity based on the distribution of information within the graph. Based on such a measurement, we further introduce an efficient sampling method that selects data samples iteratively to Maximize the Information Gain (MIG) in semantic space. Experiments on various datasets and base models demonstrate that MIG consistently outperforms state-of-the-art methods. Notably, the model fine-tuned with 5% Tulu3 data sampled by MIG achieves comparable performance to the official SFT model trained on the full dataset, with improvements of +5.73% on AlpacaEval and +6.89% on Wildbench.
2022
The VolcTrans System for WMT22 Multilingual Machine Translation Task
Xian Qian | Kai Hu | Jiaqiang Wang | Yifeng Liu | Xingyuan Pan | Jun Cao | Mingxuan Wang
Proceedings of the Seventh Conference on Machine Translation (WMT)
Xian Qian | Kai Hu | Jiaqiang Wang | Yifeng Liu | Xingyuan Pan | Jun Cao | Mingxuan Wang
Proceedings of the Seventh Conference on Machine Translation (WMT)
This report describes our VolcTrans system for the WMT22 shared task on large-scale multilingual machine translation. We participated in the unconstrained track which allows the use of external resources. Our system is a transformer-based multilingual model trained on data from multiple sources including the public training set from the data track, NLLB data provided by Meta AI, self-collected parallel corpora, and pseudo bitext from back-translation. Both bilingual and monolingual texts are cleaned by a series of heuristic rules. On the official test set, our system achieves 17.3 BLEU, 21.9 spBLEU, and 41.9 chrF2++ on average over all language pairs. Averaged inference speed is 11.5 sentences per second using a single Nvidia Tesla V100 GPU.
Search
Fix author
Co-authors
- Abhinav Aggarwal 1
- Akash Bharadwaj 1
- Jun Cao 1
- Yicheng Chen 1
- Kai Chen 1
- Li Chen 1
- Shihan Dou 1
- Matt Fredrikson 1
- Tao Gui 1
- Honglin Guo 1
- Zhenhua Han 1
- HaochenYe HaochenYe 1
- Eric Hsin 1
- Xuan-Jing Huang (黄萱菁) 1
- Ankit Jain 1
- Mehran Khodabandeh 1
- Yining Li 1
- Jiahang Lin 1
- Yifeng Liu 1
- Shichun Liu 1
- Xingyuan Pan 1
- Xian Qian 1
- Jiaqiang Wang 1
- Mingxuan Wang 1
- Binghai Wang 1
- Junzhe Wang 1
- Zhiheng Xi 1
- Hang Yan 1
- Ma Zerun 1
- David Zhang 1
- Qi Zhang 1
- Yuhao Zhou 1
- Enyu Zhou 1