SemToken: Semantic-Aware Tokenization for Efficient Long-Context Language Models

Dong Liu, Yanxuan Yu


Abstract
Long-context language models face efficiency challenges as context lengths expand. Traditional tokenization methods like BPE operate on frequency statistics, ignoring semantic structure and over-tokenizing redundant spans. We propose SemToken, a semantic-aware tokenization framework that adaptively compresses token sequences based on semantic density. SemToken uses lightweight encoders to identify and merge semantically equivalent spans, allocates variable granularity based on local semantic density, and dynamically adjusts token budgets during generation. Evaluations on WikiText-103, LongBench, and BookSum demonstrate 2.4× token reduction, 1.9× inference speedup, and 67% memory reduction while preserving or improving model quality. SemToken integrates seamlessly with existing models and achieves multiplicative benefits when combined with FlashAttention (up to 2.7× total speedup).
Anthology ID:
2026.starsem-conference.1
Volume:
Proceedings of the 15th Joint Conference on Lexical and Computational Semantics (*SEM 2026)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Saif M. Mohammad, Nedjma Ousidhoum
Venues:
*SEM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–12
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.starsem-conference.1/
DOI:
Bibkey:
Cite (ACL):
Dong Liu and Yanxuan Yu. 2026. SemToken: Semantic-Aware Tokenization for Efficient Long-Context Language Models. In Proceedings of the 15th Joint Conference on Lexical and Computational Semantics (*SEM 2026), pages 1–12, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
SemToken: Semantic-Aware Tokenization for Efficient Long-Context Language Models (Liu & Yu, *SEM 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.starsem-conference.1.pdf