Adversarial Tokenization

Renato Geh; Zilei Shao; Guy Van Den Broeck

doi:10.18653/v1/2025.acl-long.1012

Adversarial Tokenization

Renato Geh, Zilei Shao, Guy Van Den Broeck

Abstract

Current LLM pipelines account for only one possible tokenization for a given string, ignoring exponentially many alternative tokenizations during training and inference. For example, the Llama3 standard tokenization of penguin is [p,enguin], yet [peng,uin] is another perfectly valid alternative. In this paper, we show that despite LLMs being trained solely on one tokenization, they still retain semantic understanding of other tokenizations, raising questions about their implications in LLM safety. Put succinctly, we answer the following question: can we adversarially tokenize an obviously malicious string to evade safety and alignment restrictions? We show that not only is adversarial tokenization an effective yet previously neglected axis of attack, but it is also competitive against existing state-of-the-art adversarial approaches without changing the text of the harmful request. We empirically validate this exploit across three state-of-the-art LLMs and adversarial datasets, revealing a previously unknown vulnerability in subword models.

Anthology ID:: 2025.acl-long.1012
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 20738–20765
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.acl-long.1012/
DOI:: 10.18653/v1/2025.acl-long.1012
Bibkey:
Cite (ACL):: Renato Geh, Zilei Shao, and Guy Van Den Broeck. 2025. Adversarial Tokenization. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 20738–20765, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Adversarial Tokenization (Geh et al., ACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.acl-long.1012.pdf

PDF Cite Search Fix data