Transliterating Urdu for a Broad-Coverage Urdu/Hindi LFG Grammar

Muhammad Kamran Malik, Tafseer Ahmed, Sebastian Sulger, Tina Bögel, Atif Gulzar, Ghulam Raza, Sarmad Hussain, Miriam Butt


Abstract
In this paper, we present a system for transliterating the Arabic-based script of Urdu to a Roman transliteration scheme. The system is integrated into a larger system consisting of a morphology module, implemented via finite state technologies, and a computational LFG grammar of Urdu that was developed with the grammar development platform XLE (Crouch et al. 2008). Our long-term goal is to handle Hindi alongside Urdu; the two languages are very similar with respect to syntax and lexicon and hence, one grammar can be used to cover both languages. However, they are not similar concerning the script -- Hindi is written in Devanagari, while Urdu uses an Arabic-based script. By abstracting away to a common Roman transliteration scheme in the respective transliterators, our system can be enabled to handle both languages in parallel. In this paper, we discuss the pipeline architecture of the Urdu-Roman transliterator, mention several linguistic and orthographic issues and present the integration of the transliterator into the LFG parsing system.
Anthology ID:
L10-1130
Volume:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:
May
Year:
2010
Address:
Valletta, Malta
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/194_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Muhammad Kamran Malik, Tafseer Ahmed, Sebastian Sulger, Tina Bögel, Atif Gulzar, Ghulam Raza, Sarmad Hussain, and Miriam Butt. 2010. Transliterating Urdu for a Broad-Coverage Urdu/Hindi LFG Grammar. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
Cite (Informal):
Transliterating Urdu for a Broad-Coverage Urdu/Hindi LFG Grammar (Malik et al., LREC 2010)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/194_Paper.pdf