SubTokenTest: A Practical Benchmark for Real-World Sub-token Understanding

Shuyang Hou; Yi Hu; Muhan Zhang

SubTokenTest: A Practical Benchmark for Real-World Sub-token Understanding

Abstract

Recent advancements in large language models (LLMs) have significantly enhanced their reasoning capabilities. However, they continue to struggle with basic character-level tasks, such as counting letters in words—a problem rooted in their tokenization process. While existing benchmarks have highlighted this weakness through basic character operations, such failures are often dismissed due to lacking practical relevance. Yet, many real-world applications, such as navigating text-based maps or interpreting structured tables, rely heavily on precise sub-token understanding. In this regard, we introduce SubTokenTest, a comprehensive benchmark that assesses sub-token understanding through **practical, utility-driven** tasks. Our benchmark includes ten tasks across four domains and isolates tokenization-related failures by decoupling performance from complex reasoning. We provide a comprehensive evaluation of nine advanced LLMs. Additionally, we investigate the impact of test-time scaling on sub-token reasoning and explore how character-level information is encoded within the hidden states.

Anthology ID:: 2026.acl-long.915
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 19957–19999
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.915/
DOI:
Bibkey:
Cite (ACL):: Shuyang Hou, Yi Hu, and Muhan Zhang. 2026. SubTokenTest: A Practical Benchmark for Real-World Sub-token Understanding. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 19957–19999, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: SubTokenTest: A Practical Benchmark for Real-World Sub-token Understanding (Hou et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.915.pdf
Checklist:: 2026.acl-long.915.checklist.pdf

PDF Cite Search Checklist Fix data