Text or Pixels? Evaluating Efficiency and Understanding of LLMs with Visual Text Inputs

Yanhong Li; Zixuan Lan; Jiawei Zhou

doi:10.18653/v1/2025.findings-emnlp.558

Text or Pixels? Evaluating Efficiency and Understanding of LLMs with Visual Text Inputs

Abstract

Large language models (LLMs) and their multimodal variants can now process visual inputs, including images of text. This raises an intriguing question: Can we compress textual inputs by feeding them as images to reduce token usage while preserving performance?In this paper, we show that *visual text representations* are a practical and surprisingly effective form of input compression for decoder LLMs. We exploit this idea by rendering long text inputs as a single image and providing it directly to the model. This approach dramatically reduces the number of decoder tokens required, offering a new form of input compression. Through experiments on two distinct benchmarks — RULER (long-context retrieval) and CNN/DailyMail (document summarization) — we demonstrate that this text-as-image method yields substantial token savings *without degrading task performance*.

Anthology ID:: 2025.findings-emnlp.558
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 10564–10578
Language:
URL:: https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.558/
DOI:: 10.18653/v1/2025.findings-emnlp.558
Bibkey:
Cite (ACL):: Yanhong Li, Zixuan Lan, and Jiawei Zhou. 2025. Text or Pixels? Evaluating Efficiency and Understanding of LLMs with Visual Text Inputs. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 10564–10578, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Text or Pixels? Evaluating Efficiency and Understanding of LLMs with Visual Text Inputs (Li et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/name-variant-enfa-fane/2025.findings-emnlp.558.pdf
Checklist:: 2025.findings-emnlp.558.checklist.pdf

PDF Cite Search Checklist Fix data