Multi-Domain Ancient Chinese Named Entity Recognition Based on Attention-Enhanced Pre-trained Language Model

Qi Zhang; Zhiya Duan; Shijie Ma; Shengyu Liu; Zibo Yuan; RuiMin Ma

Multi-Domain Ancient Chinese Named Entity Recognition Based on Attention-Enhanced Pre-trained Language Model

Qi Zhang, Zhiya Duan, Shijie Ma, Shengyu Liu, Zibo Yuan, RuiMin Ma

Abstract

Recent advancements in digital humanities have intensified the demand for intelligent processing of ancient Chinese texts, particularly across specialized domains such as historical records and ancient medical literature. Among related research areas, Named Entity Recognition (NER) plays a crucial role, serving as the foundation for knowledge graph construction and deeper humanities computing studies. In this paper, we introduce a architecture specifically designed for multi-domain ancient Chinese NER tasks based on a pre-trained language model (PLM). Building upon the GujiRoberta backbone, we propose the GujiRoberta-BiLSTM-Attention-CRF model. Experimental results on three distinct domain-specific datasets demonstrate that our approach significantly outperforms the official baselines across all three datasets, highlighting the particular effectiveness of integrating an attention mechanism within our architecture.

Anthology ID:: 2025.alp-1.32
Volume:: Proceedings of the Second Workshop on Ancient Language Processing
Month:: May
Year:: 2025
Address:: The Albuquerque Convention Center, Laguna
Editors:: Adam Anderson, Shai Gordin, Bin Li, Yudong Liu, Marco C. Passarotti, Rachele Sprugnoli
Venues:: ALP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 237–241
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2025.alp-1.32/
DOI:
Bibkey:
Cite (ACL):: Qi Zhang, Zhiya Duan, Shijie Ma, Shengyu Liu, Zibo Yuan, and RuiMin Ma. 2025. Multi-Domain Ancient Chinese Named Entity Recognition Based on Attention-Enhanced Pre-trained Language Model. In Proceedings of the Second Workshop on Ancient Language Processing, pages 237–241, The Albuquerque Convention Center, Laguna. Association for Computational Linguistics.
Cite (Informal):: Multi-Domain Ancient Chinese Named Entity Recognition Based on Attention-Enhanced Pre-trained Language Model (Zhang et al., ALP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2025.alp-1.32.pdf

PDF Cite Search Fix data