Mutee U Rahman
2025
Universal Dependencies for Sindhi
John Bauer
|
Sakiina Shah
|
Muhammad Shaheer
|
Mir Afza Ahmed Talpur
|
Zubair Sanjrani
|
Sarwat Qureshi
|
Shafi Pirzada
|
Christopher D. Manning
|
Mutee U Rahman
Proceedings of the Eighth Workshop on Universal Dependencies (UDW, SyntaxFest 2025)
Sindhi is an Indo-Aryan language spoken primarily in Pakistan and India by about 40 million people. Despite this extensive use, it is a low resource language for NLP tasks, with few datasets or pretrained embeddings available. In this work, we explore linguistic challenges for annotating Sindhi in the UD paradigm, such as language-specific analysis of adpositions and verb forms. We use this analysis to present a newly annotated dependency treebank for Universal Dependencies, along with pretrained embeddings and an annotation pipeline specifically for Sindhi annotation.
Search
Fix author
Co-authors
- John Bauer 1
- Christopher D. Manning 1
- Shafi Pirzada 1
- Sarwat Qureshi 1
- Zubair Sanjrani 1
- show all...