Zhijun Liu
2026
Zero-shot Jianzi Recognition as Structured Visual Information Extraction in Open Compositional Symbolic Systems
Zehan Li | Fu Zhang | Zhijun Liu | Jingwei Cheng
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zehan Li | Fu Zhang | Zhijun Liu | Jingwei Cheng
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Guqin (古琴) Jianzi (減字) is an open and freely compositional tablature system that encodes performance actions rather than acoustic outcomes. Its automatic recognition remains largely unexplored, as conventional OCR assumes a closed and enumerable glyph set and struggles with Jianzi’s unbounded composition and manuscript-level variability.We introduce Zero-shot Jianzi Recognition, which formulates Jianzi recognition as vision-to-sequence prediction of canonical component sequences under a zero-shot split. To enable scalable supervision, we construct Synthetic-JZ from aligned online composition metadata. We then synthesize manuscript-like training images via component-wise style recomposition and manuscript-domain noise modeling, and fine-tune a vision–language model for end-to-end component sequence recognition. At inference time, a lightweight legality-guided correction module re-ranks decoding candidates, suppressing structural hallucinations without modifying the backbone.Experiments on two benchmarks show that our method achieves 63.02% sequence accuracy on Real-JZ, our manually annotated real-world Jianzi benchmark, surpassing Gemini-3-Pro by 35.11%. This result highlights the feasibility of reliable automated Jianzi recognition and its potential for large-scale digitization of historical Guqin Jianzi Pu manuscripts.