Automatic acquisition of Urdu nouns (along with gender and irregular plurals)

Tafseer Ahmed Khan


Abstract
The paper describes a set of methods to automatically acquire the Urdu nouns (and its gender) on the basis of inflectional and contextual clues. The algorithms used are a blend of computer’s brute force on the corpus and careful design of distinguishing rules on the basis linguistic knowledge. As there are homograph inflections for Urdu nouns, adjectives and verbs, we compare potential inflectional forms with paradigms of inflections in strict order and gives best guess (of part of speech) for the word. We also worked on irregular plurals i.e. the plural forms that are borrowed from Arabic, Persian and English. Evaluation shows that not all the borrowed rules have same productivity in Urdu. The commonly used borrowed plural rules are shown in the result.
Anthology ID:
L14-1650
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2846–2850
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/844_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Tafseer Ahmed Khan. 2014. Automatic acquisition of Urdu nouns (along with gender and irregular plurals). In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 2846–2850, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Automatic acquisition of Urdu nouns (along with gender and irregular plurals) (Ahmed Khan, LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/844_Paper.pdf