A Myanmar (Burmese)-English Named Entity Transliteration Dictionary

Aye Myat Mon, Chenchen Ding, Hour Kaing, Khin Mar Soe, Masao Utiyama, Eiichiro Sumita


Abstract
Transliteration is generally a phonetically based transcription across different writing systems. It is a crucial task for various downstream natural language processing applications. For the Myanmar (Burmese) language, robust automatic transliteration for borrowed English words is a challenging task because of the complex Myanmar writing system and the lack of data. In this study, we constructed a Myanmar-English named entity dictionary containing more than eighty thousand transliteration instances. The data have been released under a CC BY-NC-SA license. We evaluated the automatic transliteration performance using statistical and neural network-based approaches based on the prepared data. The neural network model outperformed the statistical model significantly in terms of the BLEU score on the character level. Different units used in the Myanmar script for processing were also compared and discussed.
Anthology ID:
2020.lrec-1.364
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
2980–2983
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.364
DOI:
Bibkey:
Cite (ACL):
Aye Myat Mon, Chenchen Ding, Hour Kaing, Khin Mar Soe, Masao Utiyama, and Eiichiro Sumita. 2020. A Myanmar (Burmese)-English Named Entity Transliteration Dictionary. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 2980–2983, Marseille, France. European Language Resources Association.
Cite (Informal):
A Myanmar (Burmese)-English Named Entity Transliteration Dictionary (Myat Mon et al., LREC 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2020.lrec-1.364.pdf