Tong Yang
2026
TriPlay-RL: Tri-Role Self-Play Reinforcement Learning for LLM Safety Alignment
Zhewen Tan | Wenhan Yu | Jianfeng Si | Tongxin Liu | Kaiqi Guan | Huiyan Jin | Jiawen Tao | Xiaokun Yuan | Xiangzheng Zhang | Duohe Ma | Tong Yang | Lin Sun
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zhewen Tan | Wenhan Yu | Jianfeng Si | Tongxin Liu | Kaiqi Guan | Huiyan Jin | Jiawen Tao | Xiaokun Yuan | Xiangzheng Zhang | Duohe Ma | Tong Yang | Lin Sun
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
In recent years, safety risks associated with large language models have become increasingly prominent, highlighting the urgent need to mitigate the generation of toxic and harmful content. The mainstream paradigm for LLM safety alignment typically adopts a collaborative framework involving three roles: an attacker for adversarial prompt generation, a defender for safety defense, and an evaluator for response assessment. In this paper, we propose a closed-loop reinforcement learning framework called TriPlay-RL that enables iterative and co-improving collaboration among three roles with near-zero manual annotation. Experimental results show that the attacker preserves high output diversity while achieving a 20%–50% improvement in adversarial effectiveness. The defender attains 10%–30% gains in safety performance without degrading general reasoning capability, and the evaluator continuously refines its fine-grained judgment ability through iterations, accurately distinguishing unsafe responses, simple refusals, and useful guidance. Overall, our framework establishes an efficient and scalable paradigm for LLM safety alignment, enabling continuous co-evolution within a unified learning loop. The code is available at https://github.com/Qihoo360/TriPlay-RL.
ARC: Active and Reflection-driven Context Management for Long-Horizon Information Seeking Agents
Yilun Yao | Shan Huang | Elsie Dai | Zhewen Tan | Zhenyu Duan | Shousheng Jia | Yanbing Jiang | Tong Yang
Findings of the Association for Computational Linguistics: ACL 2026
Yilun Yao | Shan Huang | Elsie Dai | Zhewen Tan | Zhenyu Duan | Shousheng Jia | Yanbing Jiang | Tong Yang
Findings of the Association for Computational Linguistics: ACL 2026
Large language models are increasingly deployed as research agents for deep search and long-horizon information seeking, yet their performance often degrades as interaction histories grow. This degradation, known as context rot, reflects a failure to maintain coherent and task-relevant internal states over extended reasoning horizons. Existing approaches primarily manage context through raw accumulation or passive summarization, treating it as a static artifact and allowing early errors or misplaced emphasis to persist. Motivated by this perspective, we propose ARC, which is the first framework to systematically formulate context management as an active, reflection-driven process that treats context as a dynamic internal reasoning state during execution. ARC operationalizes this view through reflection-driven monitoring and revision, allowing agents to actively reorganize their working context when misalignment or degradation is detected. Experiments on challenging long-horizon information-seeking benchmarks show that ARC consistently outperforms passive context compression methods, achieving up to an 11% absolute improvement in accuracy on BrowseComp-ZH with Qwen2.5-32B-Instruct.
2025
HATA: Trainable and Hardware-Efficient Hash-Aware Top-k Attention for Scalable Large Model Inference
Ping Gong | Jiawei Yi | Shengnan Wang | Juncheng Zhang | Zewen Jin | Ouxiang Zhou | Ruibo Liu | Guanbin Xu | Youhui Bai | Bowen Ye | Kun Yuan | Tong Yang | Gong Zhang | Renhai Chen | Feng Wu | Cheng Li
Findings of the Association for Computational Linguistics: ACL 2025
Ping Gong | Jiawei Yi | Shengnan Wang | Juncheng Zhang | Zewen Jin | Ouxiang Zhou | Ruibo Liu | Guanbin Xu | Youhui Bai | Bowen Ye | Kun Yuan | Tong Yang | Gong Zhang | Renhai Chen | Feng Wu | Cheng Li
Findings of the Association for Computational Linguistics: ACL 2025
Large Language Models (LLMs) have emerged as a pivotal research area, yet the attention module remains a critical bottleneck in LLM inference, even with techniques like KVCache to mitigate redundant computations. While various top-k attention mechanisms have been proposed to accelerate LLM inference by exploiting the inherent sparsity of attention, they often struggled to strike a balance between efficiency and accuracy. In this paper, we introduce HATA (Hash-Aware Top-k Attention), a novel approach that systematically integrates low-overhead learning-to-hash techniques into the Top-k attention process. Different from the existing top-k attention methods which are devoted to seeking an absolute estimation of qk score, typically with a great cost, HATA maps queries and keys into binary hash codes, and acquires the relative qk score order with a quite low cost, which is sufficient for realizing top-k attention. Extensive experiments demonstrate that HATA achieves up to 7.2× speedup compared to vanilla full attention while maintaining model accuracy. In addition, HATA outperforms the state-of-the-art top-k attention methods in both accuracy and efficiency across multiple mainstream LLM models and diverse tasks. HATA is open source at https://github.com/gpzlx1/HATA.
Search
Fix author
Co-authors
- Zhewen Tan 2
- Youhui Bai 1
- Renhai Chen 1
- Elsie Dai 1
- Zhenyu Duan 1
- Ping Gong 1
- Kaiqi Guan 1
- Shan Huang 1
- Shousheng Jia 1
- Yanbing Jiang 1
- Huiyan Jin 1
- Zewen Jin 1
- Cheng Li 1
- Tongxin Liu 1
- Ruibo Liu 1
- Duohe Ma 1
- Jianfeng Si 1
- Lin Sun 1
- Jiawen Tao 1
- Shengnan Wang 1
- Feng Wu 1
- Guanbin Xu 1
- Yilun Yao 1
- Bowen Ye 1
- Jiawei Yi 1
- Wenhan Yu 1
- Xiaokun Yuan 1
- Kun Yuan 1
- Xiangzheng Zhang 1
- Juncheng Zhang 1
- Gong Zhang 1
- Ouxiang Zhou 1