Developing LMF-XML Bilingual Dictionaries for Colloquial Arabic Dialects

David Graff, Mohamed Maamouri


Abstract
The Linguistic Data Consortium and Georgetown University Press are collaborating to create updated editions of bilingual diction- aries that had originally been published in the 1960's for English-speaking learners of Moroccan, Syrian and Iraqi Arabic. In their first editions, these dictionaries used ad hoc Latin-alphabet orthography for each colloquial Arabic dialect, but adopted some proper- ties of Arabic-based writing (collation order of Arabic headwords, clitic attachment to word forms in example phrases); despite their common features, there are notable differences among the three books that impede comparisons across the dialects, as well as com- parisons of each dialect to Modern Standard Arabic. In updating these volumes, we use both Arabic script and International Pho- netic Alphabet orthographies; the former provides a common basis for word recognition across dialects, while the latter provides dialect-specific pronunciations. Our goal is to preserve the full content of the original publications, supplement the Arabic headword inventory with new usages, and produce a uniform lexicon structure expressible via the Lexical Markup Framework (LMF, ISO 24613). To this end, we developed a relational database schema that applies consistently to each dialect, and HTTP-based tools for searching, editing, workflow, review and inventory management.
Anthology ID:
L12-1245
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
269–274
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/461_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
David Graff and Mohamed Maamouri. 2012. Developing LMF-XML Bilingual Dictionaries for Colloquial Arabic Dialects. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 269–274, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Developing LMF-XML Bilingual Dictionaries for Colloquial Arabic Dialects (Graff & Maamouri, LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/461_Paper.pdf