Gangyan Ge
2025
BIRD: Bronze Inscription Restoration and Dating
Wenjie Hua
|
Hoang H Nguyen
|
Gangyan Ge
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Bronze inscriptions from early China are fragmentary and difficult to date. We introduce BIRD (Bronze Inscription Restoration and Dating), a fully encoded dataset grounded in standard scholarly transcriptions and chronological labels. We further propose an allograph-aware masked language modeling framework that integrates domain- and task-adaptive pretraining with a Glyph Net (GN), which links graphemes and allographs. Experiments show that GN improves restoration, while glyph-biased sampling yields gains in dating.