Kyle Montgomery
2026
Fico: Evaluating Vision-Language Models under Visual Fidelity and Compression at Scale
Jianhong Tu | Nicholas Crispino | Kyle Montgomery | Chenguang Wang | Dawn Song
Findings of the Association for Computational Linguistics: ACL 2026
Jianhong Tu | Nicholas Crispino | Kyle Montgomery | Chenguang Wang | Dawn Song
Findings of the Association for Computational Linguistics: ACL 2026
Visual text compression is an emerging paradigm for rendering text as images for processing by vision-language models (VLMs), enabling higher information density per context token. However, the robustness of VLMs under dense, text-based visual inputs remains unevaluated. We introduce Fico, a benchmark designed to assess VLM robustness across seven controlled variants of visual fidelity and information density. Fico spans documents of 8k to 64k tokens and includes three tasks of increasing semantic granularity: optical character recognition (OCR), needle-in-a-haystack (NIAH) retrieval, and visual question answering (VQA). Evaluating 13 general-purpose VLMs and 3 OCR-specialized models reveals three consistent trends: performance drops sharply under increased density or reduced resolution; cross-task transfer between OCR, NIAH, and VQA is limited; and VQA is comparatively robust because low-level details are lost before high-level semantics. By exposing failure modes that remain invisible under conventional VLM evaluations, Fico establishes a rigorous test-bed for visual text compression.
2024
Re-Tuning: Overcoming the Compositionality Limits of Large Language Models with Recursive Tuning
Eric Pasewark | Kyle Montgomery | Kefei Duan | Dawn Song | Chenguang Wang
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Eric Pasewark | Kyle Montgomery | Kefei Duan | Dawn Song | Chenguang Wang
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
We present a new method for large language models to solve compositional tasks. Although they have shown strong performance on traditional language understanding tasks, large language models struggle to solve compositional tasks, where the solution depends on solving smaller instances of the same problem. We propose a natural approach to solve compositional tasks recursively. Our method, Re-Tuning, tunes models to break down a problem into subproblems, solve those subproblems, and combine the results. We show that our method significantly improves model performance on three representative compositional tasks: integer addition, dynamic programming, and parity. Compared to state-of-the-art methods that keep intermediate steps towards solving the problems, Re-Tuning achieves significantly higher accuracy and is more GPU memory efficient.