Issues and Challenges in Annotating Urdu Action Verbs on the IMAGACT4ALL Platform

Sharmin Muzaffar, Pitambar Behera, Girish Jha


Abstract
In South-Asian languages such as Hindi and Urdu, action verbs having compound constructions and serial verbs constructions pose serious problems for natural language processing and other linguistic tasks. Urdu is an Indo-Aryan language spoken by 51, 500, 0001 speakers in India. Action verbs that occur spontaneously in day-to-day communication are highly ambiguous in nature semantically and as a consequence cause disambiguation issues that are relevant and applicable to Language Technologies (LT) like Machine Translation (MT) and Natural Language Processing (NLP). IMAGACT4ALL is an ontology-driven web-based platform developed by the University of Florence for storing action verbs and their inter-relations. This group is currently collaborating with Jawaharlal Nehru University (JNU) in India to connect Indian languages on this platform. Action verbs are frequently used in both written and spoken discourses and refer to various meanings because of their polysemic nature. The IMAGACT4ALL platform stores each 3d animation image, each one of them referring to a variety of possible ontological types, which in turn makes the annotation task for the annotator quite challenging with regard to selecting verb argument structure having a range of probability distribution. The authors, in this paper, discuss the issues and challenges such as complex predicates (compound and conjunct verbs), ambiguously animated video illustrations, semantic discrepancies, and the factors of verb-selection preferences that have produced significant problems in annotating Urdu verbs on the IMAGACT ontology.
Anthology ID:
L16-1230
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1446–1451
Language:
URL:
https://aclanthology.org/L16-1230
DOI:
Bibkey:
Cite (ACL):
Sharmin Muzaffar, Pitambar Behera, and Girish Jha. 2016. Issues and Challenges in Annotating Urdu Action Verbs on the IMAGACT4ALL Platform. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 1446–1451, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Issues and Challenges in Annotating Urdu Action Verbs on the IMAGACT4ALL Platform (Muzaffar et al., LREC 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/L16-1230.pdf