Fico: Evaluating Vision-Language Models under Visual Fidelity and Compression at Scale

Jianhong Tu, Nicholas Crispino, Kyle Montgomery, Chenguang Wang, Dawn Song


Abstract
Visual text compression is an emerging paradigm for rendering text as images for processing by vision-language models (VLMs), enabling higher information density per context token. However, the robustness of VLMs under dense, text-based visual inputs remains unevaluated. We introduce Fico, a benchmark designed to assess VLM robustness across seven controlled variants of visual fidelity and information density. Fico spans documents of 8k to 64k tokens and includes three tasks of increasing semantic granularity: optical character recognition (OCR), needle-in-a-haystack (NIAH) retrieval, and visual question answering (VQA). Evaluating 13 general-purpose VLMs and 3 OCR-specialized models reveals three consistent trends: performance drops sharply under increased density or reduced resolution; cross-task transfer between OCR, NIAH, and VQA is limited; and VQA is comparatively robust because low-level details are lost before high-level semantics. By exposing failure modes that remain invisible under conventional VLM evaluations, Fico establishes a rigorous test-bed for visual text compression.
Anthology ID:
2026.findings-acl.1758
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
35261–35277
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1758/
DOI:
Bibkey:
Cite (ACL):
Jianhong Tu, Nicholas Crispino, Kyle Montgomery, Chenguang Wang, and Dawn Song. 2026. Fico: Evaluating Vision-Language Models under Visual Fidelity and Compression at Scale. In Findings of the Association for Computational Linguistics: ACL 2026, pages 35261–35277, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Fico: Evaluating Vision-Language Models under Visual Fidelity and Compression at Scale (Tu et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1758.pdf
Checklist:
 2026.findings-acl.1758.checklist.pdf