Adversarial Decoding: Generating Readable Documents for Adversarial Objectives

Collin Zhang, Tingwei Zhang, Vitaly Shmatikov


Abstract
We design, implement, and evaluate adversarial decoding, a new, generic text generation technique that produces readable documents for adversarial objectives such as RAG poisoning, jailbreaking, and evasion of defensive filters. Prior generation methods either produce easily detectable gibberish (even methods that optimize for low perplexity), or cannot handle objectives that include embedding similarity. In particular, they cannot produce readable adversarial documents that (1) are retrieved by RAG systems in response to broad classes of queries, and (2) adversarially influence subsequent generation. We measure the effectiveness of adversarial decoding for different objectives and demonstrate that it outperforms existing methods while producing adversarial documents that cannot be automatically distinguished from natural documents by fluency and readability.
Anthology ID:
2026.findings-eacl.108
Volume:
Findings of the Association for Computational Linguistics: EACL 2026
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2053–2068
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.108/
DOI:
Bibkey:
Cite (ACL):
Collin Zhang, Tingwei Zhang, and Vitaly Shmatikov. 2026. Adversarial Decoding: Generating Readable Documents for Adversarial Objectives. In Findings of the Association for Computational Linguistics: EACL 2026, pages 2053–2068, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Adversarial Decoding: Generating Readable Documents for Adversarial Objectives (Zhang et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.108.pdf
Checklist:
 2026.findings-eacl.108.checklist.pdf