A Myanmar (Burmese)-English Named Entity Transliteration Dictionary
Aye Myat Mon, Chenchen Ding, Hour Kaing, Khin Mar Soe, Masao Utiyama, Eiichiro Sumita
Abstract
Transliteration is generally a phonetically based transcription across different writing systems. It is a crucial task for various downstream natural language processing applications. For the Myanmar (Burmese) language, robust automatic transliteration for borrowed English words is a challenging task because of the complex Myanmar writing system and the lack of data. In this study, we constructed a Myanmar-English named entity dictionary containing more than eighty thousand transliteration instances. The data have been released under a CC BY-NC-SA license. We evaluated the automatic transliteration performance using statistical and neural network-based approaches based on the prepared data. The neural network model outperformed the statistical model significantly in terms of the BLEU score on the character level. Different units used in the Myanmar script for processing were also compared and discussed.- Anthology ID:
- 2020.lrec-1.364
- Volume:
- Proceedings of the Twelfth Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2020
- Address:
- Marseille, France
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 2980–2983
- Language:
- English
- URL:
- https://aclanthology.org/2020.lrec-1.364
- DOI:
- Cite (ACL):
- Aye Myat Mon, Chenchen Ding, Hour Kaing, Khin Mar Soe, Masao Utiyama, and Eiichiro Sumita. 2020. A Myanmar (Burmese)-English Named Entity Transliteration Dictionary. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 2980–2983, Marseille, France. European Language Resources Association.
- Cite (Informal):
- A Myanmar (Burmese)-English Named Entity Transliteration Dictionary (Myat Mon et al., LREC 2020)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2020.lrec-1.364.pdf