Peiqiang Qiu
2026
ACSE: An Ancient Character Semantic-Aware Embedding for Large Language Models
Zhihan Zhou | Daqian Shi | Lida Shi | Rui Song | Peiqiang Qiu | Xiaolei Diao | Hao Xu
Findings of the Association for Computational Linguistics: ACL 2026
Zhihan Zhou | Daqian Shi | Lida Shi | Rui Song | Peiqiang Qiu | Xiaolei Diao | Hao Xu
Findings of the Association for Computational Linguistics: ACL 2026
Research on ancient Chinese language is of great significance for tracing Chinese history and civilization. In the field of large language models, studies on the pre-Qin excavated documents such as Oracle Bone Inscriptions, Bronze Inscriptions, and Bamboo Book of Chu remain insufficient. This is because these ancient characters have a low level of digitization, training corpora are extremely scarce, and they typically contain complex and rich semantic information. Therefore, we propose an ancient character semantic-aware embedding for large language models. This embedding integrates both the glyph and lexicality of ancient characters and maps them to the modern Chinese semantic space. We also design a two-stage method for lightweight and parameter-efficient training of the embedding. Finally, we conduct extensive experiments on excavated documents from the pre-Qin period, and the results demonstrate the effectiveness of our approach.