Maaz Anwar


2016

pdf
A Proposition Bank of Urdu
Maaz Anwar | Riyaz Ahmad Bhat | Dipti Sharma | Ashwini Vaidya | Martha Palmer | Tafseer Ahmed Khan
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper describes our efforts for the development of a Proposition Bank for Urdu, an Indo-Aryan language. Our primary goal is the labeling of syntactic nodes in the existing Urdu dependency Treebank with specific argument labels. In essence, it involves annotation of predicate argument structures of both simple and complex predicates in the Treebank corpus. We describe the overall process of building the PropBank of Urdu. We discuss various statistics pertaining to the Urdu PropBank and the issues which the annotators encountered while developing the PropBank. We also discuss how these challenges were addressed to successfully expand the PropBank corpus. While reporting the Inter-annotator agreement between the two annotators, we show that the annotators share similar understanding of the annotation guidelines and of the linguistic phenomena present in the language. The present size of this Propbank is around 180,000 tokens which is double-propbanked by the two annotators for simple predicates. Another 100,000 tokens have been annotated for complex predicates of Urdu.

pdf
Towards Building Semantic Role Labeler for Indian Languages
Maaz Anwar | Dipti Sharma
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present a statistical system for identifying the semantic relationships or semantic roles for two major Indian Languages, Hindi and Urdu. Given an input sentence and a predicate/verb, the system first identifies the arguments pertaining to that verb and then classifies it into one of the semantic labels which can either be a DOER, THEME, LOCATIVE, CAUSE, PURPOSE etc. The system is based on 2 statistical classifiers trained on roughly 130,000 words for Urdu and 100,000 words for Hindi that were hand-annotated with semantic roles under the PropBank project for these two languages. Our system achieves an accuracy of 86% in identifying the arguments of a verb for Hindi and 75% for Urdu. At the subsequent task of classifying the constituents into their semantic roles, the Hindi system achieved 58% precision and 42% recall whereas Urdu system performed better and achieved 83% precision and 80% recall. Our study also allowed us to compare the usefulness of different linguistic features and feature combinations in the semantic role labeling task. We also examine the use of statistical syntactic parsing as feature in the role labeling task.