A General Method for Creating a Bilingual Transliteration Dictionary

Amit Kirschenbaum, Shuly Wintner


Abstract
Transliteration is the rendering in one language of terms from another language (and, possibly, another writing system), approximating spelling and/or phonetic equivalents between the two languages. A transliteration dictionary is a crucial resource for a variety of natural language applications, most notably machine translation. We describe a general method for creating bilingual transliteration dictionaries from Wikipedia article titles. The method can be applied to any language pair with Wikipedia presence, independently of the writing systems involved, and requires only a single simple resource that can be provided by any literate bilingual speaker. It was successfully applied to extract a Hebrew-English transliteration dictionary which, when incorporated in a machine translation system, indeed improved its performance.
Anthology ID:
L10-1003
Volume:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:
May
Year:
2010
Address:
Valletta, Malta
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/16_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Amit Kirschenbaum and Shuly Wintner. 2010. A General Method for Creating a Bilingual Transliteration Dictionary. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
Cite (Informal):
A General Method for Creating a Bilingual Transliteration Dictionary (Kirschenbaum & Wintner, LREC 2010)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/16_Paper.pdf