Gangyan Ge


2025

pdf bib
BIRD: Bronze Inscription Restoration and Dating
Wenjie Hua | Hoang H Nguyen | Gangyan Ge
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Bronze inscriptions from early China are fragmentary and difficult to date. We introduce BIRD (Bronze Inscription Restoration and Dating), a fully encoded dataset grounded in standard scholarly transcriptions and chronological labels. We further propose an allograph-aware masked language modeling framework that integrates domain- and task-adaptive pretraining with a Glyph Net (GN), which links graphemes and allographs. Experiments show that GN improves restoration, while glyph-biased sampling yields gains in dating.