Fico: Evaluating Vision-Language Models under Visual Fidelity and Compression at Scale
Jianhong Tu, Nicholas Crispino, Kyle Montgomery, Chenguang Wang, Dawn Song
Abstract
Visual text compression is an emerging paradigm for rendering text as images for processing by vision-language models (VLMs), enabling higher information density per context token. However, the robustness of VLMs under dense, text-based visual inputs remains unevaluated. We introduce Fico, a benchmark designed to assess VLM robustness across seven controlled variants of visual fidelity and information density. Fico spans documents of 8k to 64k tokens and includes three tasks of increasing semantic granularity: optical character recognition (OCR), needle-in-a-haystack (NIAH) retrieval, and visual question answering (VQA). Evaluating 13 general-purpose VLMs and 3 OCR-specialized models reveals three consistent trends: performance drops sharply under increased density or reduced resolution; cross-task transfer between OCR, NIAH, and VQA is limited; and VQA is comparatively robust because low-level details are lost before high-level semantics. By exposing failure modes that remain invisible under conventional VLM evaluations, Fico establishes a rigorous test-bed for visual text compression.- Anthology ID:
- 2026.findings-acl.1758
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 35261–35277
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1758/
- DOI:
- Cite (ACL):
- Jianhong Tu, Nicholas Crispino, Kyle Montgomery, Chenguang Wang, and Dawn Song. 2026. Fico: Evaluating Vision-Language Models under Visual Fidelity and Compression at Scale. In Findings of the Association for Computational Linguistics: ACL 2026, pages 35261–35277, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Fico: Evaluating Vision-Language Models under Visual Fidelity and Compression at Scale (Tu et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1758.pdf