Zixuan Lan


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2025

pdf bib
Text or Pixels? Evaluating Efficiency and Understanding of LLMs with Visual Text Inputs
Yanhong Li | Zixuan Lan | Jiawei Zhou
Findings of the Association for Computational Linguistics: EMNLP 2025

Large language models (LLMs) and their multimodal variants can now process visual inputs, including images of text. This raises an intriguing question: Can we compress textual inputs by feeding them as images to reduce token usage while preserving performance?In this paper, we show that *visual text representations* are a practical and surprisingly effective form of input compression for decoder LLMs. We exploit this idea by rendering long text inputs as a single image and providing it directly to the model. This approach dramatically reduces the number of decoder tokens required, offering a new form of input compression. Through experiments on two distinct benchmarks — RULER (long-context retrieval) and CNN/DailyMail (document summarization) — we demonstrate that this text-as-image method yields substantial token savings *without degrading task performance*.