Recontextualizing Revitalization: A Mixed Media Approach to Reviving the Nüshu Language

Ivory Yang, Xiaobo Guo, Yuxin Wang, Hefan Zhang, Yaning Jia, William Dinauer, Soroush Vosoughi


Abstract
Nüshu is an endangered language from Jiangyong County, China, and the world’s only known writing system created and used exclusively by women. Recent Natural Language Processing (NLP) work has digitized small Nüshu-Chinese corpora, but the script remains computationally inaccessible due to its handwritten, mixed-media form and dearth of multimodal resources. We address this gap with two novel datasets: NüshuVision, an image corpus of 500 rendered sentences in traditional vertical, right-to-left orthography, and NüshuStrokes, the first sequential handwriting recordings of all 397 Unicode Nüshu characters by an expert calligrapher. Evaluating five state-of-the-art Chinese Optical Character Recognition (OCR) systems on NüshuVision shows that all fail entirely, each yielding a Character Error Rate (CER) of 1.0. Fine-tuning Microsoft’s TrOCR on NüshuVision lowers CER to 0.67, a modest yet meaningful improvement. These contributions establish the first multimodal foundation for Nüshu revitalization and offer a culturally grounded framework for language preservation.
Anthology ID:
2025.emnlp-main.627
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
12430–12439
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.627/
DOI:
Bibkey:
Cite (ACL):
Ivory Yang, Xiaobo Guo, Yuxin Wang, Hefan Zhang, Yaning Jia, William Dinauer, and Soroush Vosoughi. 2025. Recontextualizing Revitalization: A Mixed Media Approach to Reviving the Nüshu Language. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 12430–12439, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Recontextualizing Revitalization: A Mixed Media Approach to Reviving the Nüshu Language (Yang et al., EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.627.pdf
Checklist:
 2025.emnlp-main.627.checklist.pdf