Vector Calligrapher: Generating Scalable Vector Graphics via Structured Linguistic Supervision

Bo Zhou, Xikang Chen, Yan Gong, Yin Zhang


Abstract
Generating SVG-based fonts requires Multimodal Large Language Models (MLLMs) to translate high-level linguistic intent into low-level, topologically constrained symbolic sequences. However, current approaches struggle with two fundamental misalignments: the semantic ambiguity of unstructured natural language for precise geometric control, and the inefficiency of generic text tokenizers, which fragment coordinate-dense SVG XML into excessively long sequences with low information density. In this work, we propose Vector Calligrapher, a system that treats SVG generation as a conditional language modeling task optimized for both semantic grounding and representational efficiency.To bridge the semantic gap, we introduce a structured linguistic supervision Font Description Framework that decomposes typographic style into interpretable linguistic dimensions (e.g., historical lineage, affective metaphors), providing structured supervision aligned with the compositional syntax of SVG. To address the tokenization bottleneck, we design a scalable separated-coordinate strategy that bypasses the vocabulary explosion of flattened tokens while significantly compressing sequence length. Supported by VectorFont, a dataset of over 10 million hierarchically annotated glyphs, our approach improves CLIP score by +23%, reduces geometric error by ≈48%, and boosts generation efficiency by achieving an 18% Command-per-Token (C/T) ratio—a 6× increase in information density over standard baselines. These results demonstrate that combining structured linguistic supervision with efficient symbolic tokenization is essential for reliable, controllable vector graphics synthesis. VectorFont dataset, Code and model weights will be publicly released.
Anthology ID:
2026.acl-long.511
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11152–11168
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.511/
DOI:
Bibkey:
Cite (ACL):
Bo Zhou, Xikang Chen, Yan Gong, and Yin Zhang. 2026. Vector Calligrapher: Generating Scalable Vector Graphics via Structured Linguistic Supervision. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11152–11168, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Vector Calligrapher: Generating Scalable Vector Graphics via Structured Linguistic Supervision (Zhou et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.511.pdf
Checklist:
 2026.acl-long.511.checklist.pdf