Learning to Conceal Risk: Controllable Multi-turn Red Teaming for LLMs in the Financial Domain
Gang Cheng, Haibo Jin, Wenbin Zhang, Haohan Wang, Jun Zhuang
Abstract
Large Language Models (LLMs) are increasingly deployed in finance, where unsafe behavior can lead to serious regulatory risks. However, most red-teaming research focuses on overtly harmful content and overlooks attacks that appear legitimate on the surface yet induce regulatory-violating responses. We address this gap by introducing a controllable black-box multi-turn risk-concealed redteaming framework (CoRT) that progressively conceals surface-level risk while exploiting regulatory-violating behaviors. CoRT contains two key components: (i) a Risk Concealment Attacker (RCA) that generates multiturn prompts via iterative refinement, and (ii) a Risk Concealment Controller (RCC) that predicts a turn-level Risk Concealment Score (RCS) to steer RCA’s follow-up style. We also build a domain-specific benchmark, FinRisk-Bench, with 522 instructions spanning six financial risk categories. Experiments on nine widely used LLMs show that CoRT (RCA) achieves 93.19% average attack success rate (ASR), and CoRT (RCA+RCC) further improves the average ASR to 95.00%. Our code and FinRisk-Bench are available at https://github.com/gcheng128/CoRT.- Anthology ID:
- 2026.acl-long.1903
- Volume:
- Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 41005–41020
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.1903/
- DOI:
- Cite (ACL):
- Gang Cheng, Haibo Jin, Wenbin Zhang, Haohan Wang, and Jun Zhuang. 2026. Learning to Conceal Risk: Controllable Multi-turn Red Teaming for LLMs in the Financial Domain. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 41005–41020, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Learning to Conceal Risk: Controllable Multi-turn Red Teaming for LLMs in the Financial Domain (Cheng et al., ACL 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.1903.pdf