Yaning Jia
2025
Recontextualizing Revitalization: A Mixed Media Approach to Reviving the Nüshu Language
Ivory Yang
|
Xiaobo Guo
|
Yuxin Wang
|
Hefan Zhang
|
Yaning Jia
|
William Dinauer
|
Soroush Vosoughi
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Nüshu is an endangered language from Jiangyong County, China, and the world’s only known writing system created and used exclusively by women. Recent Natural Language Processing (NLP) work has digitized small Nüshu-Chinese corpora, but the script remains computationally inaccessible due to its handwritten, mixed-media form and dearth of multimodal resources. We address this gap with two novel datasets: NüshuVision, an image corpus of 500 rendered sentences in traditional vertical, right-to-left orthography, and NüshuStrokes, the first sequential handwriting recordings of all 397 Unicode Nüshu characters by an expert calligrapher. Evaluating five state-of-the-art Chinese Optical Character Recognition (OCR) systems on NüshuVision shows that all fail entirely, each yielding a Character Error Rate (CER) of 1.0. Fine-tuning Microsoft’s TrOCR on NüshuVision lowers CER to 0.67, a modest yet meaningful improvement. These contributions establish the first multimodal foundation for Nüshu revitalization and offer a culturally grounded framework for language preservation.
Search
Fix author
Co-authors
- William Dinauer 1
- Xiaobo Guo 1
- Soroush Vosoughi 1
- Yuxin Wang 1
- Ivory Yang 1
- show all...