A Proposition Bank of Urdu

Maaz Anwar, Riyaz Ahmad Bhat, Dipti Sharma, Ashwini Vaidya, Martha Palmer, Tafseer Ahmed Khan

[How to correct problems with metadata yourself]


Abstract
This paper describes our efforts for the development of a Proposition Bank for Urdu, an Indo-Aryan language. Our primary goal is the labeling of syntactic nodes in the existing Urdu dependency Treebank with specific argument labels. In essence, it involves annotation of predicate argument structures of both simple and complex predicates in the Treebank corpus. We describe the overall process of building the PropBank of Urdu. We discuss various statistics pertaining to the Urdu PropBank and the issues which the annotators encountered while developing the PropBank. We also discuss how these challenges were addressed to successfully expand the PropBank corpus. While reporting the Inter-annotator agreement between the two annotators, we show that the annotators share similar understanding of the annotation guidelines and of the linguistic phenomena present in the language. The present size of this Propbank is around 180,000 tokens which is double-propbanked by the two annotators for simple predicates. Another 100,000 tokens have been annotated for complex predicates of Urdu.
Anthology ID:
L16-1377
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2379–2386
Language:
URL:
https://aclanthology.org/L16-1377
DOI:
Bibkey:
Cite (ACL):
Maaz Anwar, Riyaz Ahmad Bhat, Dipti Sharma, Ashwini Vaidya, Martha Palmer, and Tafseer Ahmed Khan. 2016. A Proposition Bank of Urdu. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 2379–2386, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
A Proposition Bank of Urdu (Anwar et al., LREC 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/teach-a-man-to-fish/L16-1377.pdf