Comparing Text Compression Capabilities of Large Language Models with Traditional Compression Algorithms

Mehran Haddadi, William John Teahan


Abstract
This work evaluates the non-English and unstructured text compression performance of Large Language Models (LLMs) by comparing them with traditional baselines on datasets from eight most widely spoken languages. Experimental results show that the evaluated LLM (LLaMA-3.2-1B) was considerably outperformed by the baselines, particularly on non-English datasets, where its performance relative to the best baseline was more than three times worse than on English datasets on average. It also compressed unstructured English data up to more than twofold less effectively than plain English data. Traditional methods, however, remained largely dataset-agnostic. Surprisingly, the LLM achieved worse compression ratios on some datasets than others despite modeling them more accurately. Overall, the outcomes and substantially higher compression time and resource consumption indicate that current LLMs are highly impractical for the compression task, where traditional methods continue to excel. Codes are available at: https://github.com/mehranhaddadi13/llm_compress.
Anthology ID:
2026.eacl-srw.16
Volume:
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Selene Baez Santamaria, Sai Ashish Somayajula, Atsuki Yamaguchi
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
219–232
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-srw.16/
DOI:
Bibkey:
Cite (ACL):
Mehran Haddadi and William John Teahan. 2026. Comparing Text Compression Capabilities of Large Language Models with Traditional Compression Algorithms. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 4: Student Research Workshop), pages 219–232, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Comparing Text Compression Capabilities of Large Language Models with Traditional Compression Algorithms (Haddadi & Teahan, EACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-srw.16.pdf