Exploiting Phonetics and Glyph Representation at Radical-level for Classical Chinese Understanding

Junyi Xiang, Maofu Liu


Abstract
The diachronic gap between classical and modern Chinese arises from century-scale language evolution through cumulative changes in phonological, syntactic, and lexical systems, resulting in substantial semantic variation that poses significant challenges for the computational modeling of historical texts. Current methods always enhance classical Chinese understanding of pre-trained language models through corpus pre-training or semantic integration. However, they overlook the synergistic relationship between phonetic and glyph features within Chinese characters, which is a critical factor in deciphering characters’ semantics. In this paper, we propose RPGCM, a radical-level phonetics and glyph representation enhanced Chinese model, with powerful fine-grained semantic modeling capabilities. Our model establishes robust contextualized representations through: (1) rules-based radical decomposition and bype pair encoder (BPE) based radical aggregated for structural pattern recognition, (2) phonetic-glyph semantic mapping, and (3) dynamic semantic fusion. Experimental results on CCMRC, WYWEB, and C³Bench benchmarks demonstrate the RPGCM’s superiority and validate that explicit radical-level modeling effectively mitigates semantic variations.
Anthology ID:
2025.findings-acl.1173
Volume:
Findings of the Association for Computational Linguistics: ACL 2025
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
22850–22871
Language:
URL:
https://preview.aclanthology.org/landing_page/2025.findings-acl.1173/
DOI:
Bibkey:
Cite (ACL):
Junyi Xiang and Maofu Liu. 2025. Exploiting Phonetics and Glyph Representation at Radical-level for Classical Chinese Understanding. In Findings of the Association for Computational Linguistics: ACL 2025, pages 22850–22871, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Exploiting Phonetics and Glyph Representation at Radical-level for Classical Chinese Understanding (Xiang & Liu, Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2025.findings-acl.1173.pdf