Proxy Barrier: A Hidden Repeater Layer Defense Against System Prompt Leakage and Jailbreaking

Pedro Schindler Freire Brasil Ribeiro, Iago Alves Brito, Rafael Teixeira Sousa, Fernanda Bufon Färber, Julia Soares Dollis, Arlindo Rodrigues Galvão Filho


Abstract
Prompt injection and jailbreak attacks remain a critical vulnerability for deployed large language models (LLMs), allowing adversaries to bypass safety protocols and extract sensitive information. To address this, we present Proxy Barrier (ProB), a lightweight defense that interposes a proxy LLM between the user and the target model. The proxy LLM is tasked solely to repeat the user input, and any failure indicates the presence of an attempt to reveal or override system instructions, leading the malicious request to be detected and blocked before it reaches the target model. ProB therefore requires no access to model weights or prompts, and is deployable entirely at the API level. Experiments across multiple model families demonstrate that ProB achieves state-of-the-art resilience against prompt leakage and jailbreak attacks. Notably, our approach outperforms baselines and achieves up to 98.8% defense effectiveness, and shows robust protection across both open and closed-source LLMs when suitably paired with proxy models, while also keeping response quality intact.
Anthology ID:
2025.findings-emnlp.528
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9960–9975
Language:
URL:
https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.528/
DOI:
10.18653/v1/2025.findings-emnlp.528
Bibkey:
Cite (ACL):
Pedro Schindler Freire Brasil Ribeiro, Iago Alves Brito, Rafael Teixeira Sousa, Fernanda Bufon Färber, Julia Soares Dollis, and Arlindo Rodrigues Galvão Filho. 2025. Proxy Barrier: A Hidden Repeater Layer Defense Against System Prompt Leakage and Jailbreaking. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 9960–9975, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Proxy Barrier: A Hidden Repeater Layer Defense Against System Prompt Leakage and Jailbreaking (Ribeiro et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.528.pdf
Checklist:
 2025.findings-emnlp.528.checklist.pdf