Chuntao Li
2026
Specializing Large Models for Oracle Bone Script Interpretation via Component-Grounded Multimodal Knowledge Augmentation
Jianing Zhang | Runan Li | Honglin Pang | Ding Xia | Zhou Zhu | Qian Zhang | Chuntao Li | Xi Yang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jianing Zhang | Runan Li | Honglin Pang | Ding Xia | Zhou Zhu | Qian Zhang | Chuntao Li | Xi Yang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Deciphering ancient Chinese Oracle Bone Script (OBS) is a challenging task that offers insights into the beliefs, systems, and culture of the ancient era. Existing approaches treat decipherment as a closed-set image recognition problem, which fails to bridge the “interpretation gap”: while individual characters are often unique and rare, they are composed of a limited set of recurring, pictographic components that carry transferable semantic meanings. To leverage this structural logic, we propose an agent-driven Vision-Language Model (VLM) framework that integrates a VLM for precise visual grounding with an LLM-based agent to automate a reasoning chain of component identification, graph-based knowledge retrieval, and relationship inference for linguistically accurate interpretation. To support this, we also introduce OB-Radix, an expert-annotated dataset providing structural and semantic data absent from prior corpora, comprising 1,022 character images (934 unique characters) and 1,853 fine-grained component images across 478 distinct components with verified explanations. By evaluating our system across three benchmarks of different tasks, we demonstrate that our framework yields more detailed and precise decipherments compared to baseline methods.
2024
Ancient Chinese Glyph Identification Powered by Radical Semantics
Yang Chi | Fausto Giunchiglia | Chuntao Li | Hao Xu
Findings of the Association for Computational Linguistics: ACL 2024
Yang Chi | Fausto Giunchiglia | Chuntao Li | Hao Xu
Findings of the Association for Computational Linguistics: ACL 2024
The ancestor of Chinese character – the ancient characters from about 1300 BC to 200 BC are not fixed in their writing glyphs. At the same or different points in time, one character can possess multiple glyphs that are different in shapes or radicals. Nearly half of ancient glyphs have not been deciphered yet. This paper proposes an innovative task of ancient Chinese glyph identification, which aims at inferring the Chinese character label for the unknown ancient Chinese glyphs which are not in the training set based on the image and radical information. Specifically, we construct a Chinese glyph knowledge graph (CGKG) associating glyphs in different historical periods according to the radical semantics, and propose a multimodal Chinese glyph identification framework (MCGI) fusing the visual, textual, and the graph data. The experiment is designed on a real Chinese glyph dataset spanning over 1000 years, it demonstrates the effectiveness of our method, and reports the potentials of each modality on this task. It provides a preliminary reference for the automatic ancient Chinese character deciphering at the glyph level.
2022
ZiNet: Linking Chinese Characters Spanning Three Thousand Years
Yang Chi | Fausto Giunchiglia | Daqian Shi | Xiaolei Diao | Chuntao Li | Hao Xu
Findings of the Association for Computational Linguistics: ACL 2022
Yang Chi | Fausto Giunchiglia | Daqian Shi | Xiaolei Diao | Chuntao Li | Hao Xu
Findings of the Association for Computational Linguistics: ACL 2022
Modern Chinese characters evolved from 3,000 years ago. Up to now, tens of thousands of glyphs of ancient characters have been discovered, which must be deciphered by experts to interpret unearthed documents. Experts usually need to compare each ancient character to be examined with similar known ones in whole historical periods. However, it is inevitably limited by human memory and experience, which often cost a lot of time but associations are limited to a small scope. To help researchers discover glyph similar characters, this paper introduces ZiNet, the first diachronic knowledge base describing relationships and evolution of Chinese characters and words. In addition, powered by the knowledge of radical systems in ZiNet, this paper introduces glyph similarity measurement between ancient Chinese characters, which could capture similar glyph pairs that are potentially related in origins or semantics. Results show strong positive correlations between scores from the method and from human experts. Finally, qualitative analysis and implicit future applications are presented.