ShUD: the First Shanghainese Universal Dependency Treebank

Qizhen Yang


Abstract
This paper introduces ShUD, the first Universal Dependencies (UD) treebank for Shanghainese, a Wu Chinese variant spoken by approximately 14 million people but severely under-resourced in NLP. The treebank is built through a scalable annotation pipeline that exploits grammatical parallels between Shanghainese and Mandarin. Our pipeline also provides a practical strategy for bootstrapping resources for other Chinese dialects. We documented syntactic phenomena unique to Shanghainese within the UD framework and fine-tuned a dependency parser using our annotated treebank, contributing a foundation to both NLP tool development and cross-linguistic syntactic research.
Anthology ID:
2025.udw-1.20
Volume:
Proceedings of the Eighth Workshop on Universal Dependencies (UDW, SyntaxFest 2025)
Month:
August
Year:
2025
Address:
Ljubljana, Slovenia
Editors:
Gosse Bomma, Çağrı Çöltekin
Venues:
UDW | WS | SyntaxFest
SIG:
SIGPARSE
Publisher:
Association for Computational Linguistics
Note:
Pages:
186–193
Language:
URL:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.udw-1.20/
DOI:
Bibkey:
Cite (ACL):
Qizhen Yang. 2025. ShUD: the First Shanghainese Universal Dependency Treebank. In Proceedings of the Eighth Workshop on Universal Dependencies (UDW, SyntaxFest 2025), pages 186–193, Ljubljana, Slovenia. Association for Computational Linguistics.
Cite (Informal):
ShUD: the First Shanghainese Universal Dependency Treebank (Yang, UDW-SyntaxFest 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/mtsummit-25-ingestion/2025.udw-1.20.pdf