Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning

Yifan Wang; Shiyu Li; Peiming Li; Xiaochen Yang; Zheng Wei; Yang Tang

Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning

Yifan Wang, Shiyu Li, Peiming Li, Xiaochen Yang, Zheng Wei, Yang Tang

Abstract

Chain-of-Thought (CoT) prompting has achieved remarkable success in unlocking the reasoning capabilities of Large Language Models (LLMs). Although CoT prompting enhances reasoning, its verbosity imposes substantial computational overhead. Recent works often focus exclusively on outcome alignment and lack supervision on the intermediate reasoning process. These deficiencies obscure the analyzability of the latent reasoning chain. To address these challenges, we introduce **Render-of-Thought (RoT)**, the first framework to reify the reasoning chain by rendering textual steps into images, making the latent rationale explicit and traceable. Specifically, we leverage the vision encoders of existing Vision Language Models (VLMs) as semantic anchors to align the vision embeddings with the textual space. This design ensures **plug-and-play** implementation without incurring additional pre-training overhead. Extensive experiments on mathematical and logical reasoning benchmarks demonstrate that our method achieves 3-4× token compression and substantial inference acceleration compared to explicit CoT. Furthermore, it demonstrates a competitive efficiency-accuracy Pareto exploration compared to other methods, validating the feasibility of this paradigm. Our code is available at https://github.com/TencentBAC/RoT

Anthology ID:: 2026.acl-long.2097
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 45236–45253
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.2097/
DOI:
Bibkey:
Cite (ACL):: Yifan Wang, Shiyu Li, Peiming Li, Xiaochen Yang, Zheng Wei, and Yang Tang. 2026. Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 45236–45253, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning (Wang et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.2097.pdf
Checklist:: 2026.acl-long.2097.checklist.pdf

PDF Cite Search Checklist Fix data