Ye Kyaw Thu

Also published as: Ye Kyaw Thu


2021

pdf
NECTEC’s Participation in WAT-2021
Zar Zar Hlaing | Ye Kyaw Thu | Thazin Myint Oo | Mya Ei San | Sasiporn Usanavasin | Ponrudee Netisopakul | Thepchai Supnithi
Proceedings of the 8th Workshop on Asian Translation (WAT2021)

In this paper, we report the experimental results of Machine Translation models conducted by a NECTEC team for the translation tasks of WAT-2021. Basically, our models are based on neural methods for both directions of English-Myanmar and Myanmar-English language pairs. Most of the existing Neural Machine Translation (NMT) models mainly focus on the conversion of sequential data and do not directly use syntactic information. However, we conduct multi-source neural machine translation (NMT) models using the multilingual corpora such as string data corpus, tree data corpus, or POS-tagged data corpus. The multi-source translation is an approach to exploit multiple inputs (e.g. in two different formats) to increase translation accuracy. The RNN-based encoder-decoder model with attention mechanism and transformer architectures have been carried out for our experiment. The experimental results showed that the proposed models of RNN-based architecture outperform the baseline model for English-to-Myanmar translation task, and the multi-source and shared-multi-source transformer models yield better translation results than the baseline.

pdf
Hybrid Statistical Machine Translation for English-Myanmar: UTYCC Submission to WAT-2021
Ye Kyaw Thu | Thazin Myint Oo | Hlaing Myat Nwe | Khaing Zar Mon | Nang Aeindray Kyaw | Naing Linn Phyo | Nann Hwan Khun | Hnin Aye Thant
Proceedings of the 8th Workshop on Asian Translation (WAT2021)

In this paper we describe our submissions to WAT-2021 (Nakazawa et al., 2021) for English-to-Myanmar language (Burmese) task. Our team, ID: “YCC-MT1”, focused on bringing transliteration knowledge to the decoder without changing the model. We manually extracted the transliteration word/phrase pairs from the ALT corpus and applying XML markup feature of Moses decoder (i.e. -xml-input exclusive, -xml-input inclusive). We demonstrate that hybrid translation technique can significantly improve (around 6 BLEU scores) the baseline of three well-known “Phrase-based SMT”, “Operation Sequence Model” and “Hierarchical Phrase-based SMT”. Moreover, this simple hybrid method achieved the second highest results among the submitted MT systems for English-to-Myanmar WAT2021 translation share task according to BLEU (Papineni et al., 2002) and AMFM scores (Banchs et al., 2015).

2019

pdf
Statistical Machine Translation between Myanmar (Burmese) and Dawei (Tavoyan)
Thazin Myint Oo | Ye Kyaw Thu | Khin Mar Soe | Thepchai Supnithi
Proceedings of the First International Workshop on NLP Solutions for Under Resourced Languages (NSURL 2019) co-located with ICNLSP 2019 - Short Papers

pdf
String Similarity Measures for Myanmar Language (Burmese)
Khaing Hsu Wai | Ye Kyaw Thu | Hnin Aye Thant | Swe Zin Moe | Thepchai Supnithi
Proceedings of the First International Workshop on NLP Solutions for Under Resourced Languages (NSURL 2019) co-located with ICNLSP 2019 - Short Papers

pdf
Neural Machine Translation between Myanmar (Burmese) and Rakhine (Arakanese)
Thazin Myint Oo | Ye Kyaw Thu | Khin Mar Soe
Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects

This work explores neural machine translation between Myanmar (Burmese) and Rakhine (Arakanese). Rakhine is a language closely related to Myanmar, often considered a dialect. We implemented three prominent neural machine translation (NMT) systems: recurrent neural networks (RNN), transformer, and convolutional neural networks (CNN). The systems were evaluated on a Myanmar-Rakhine parallel text corpus developed by us. In addition, two types of word segmentation schemes for word embeddings were studied: Word-BPE and Syllable-BPE segmentation. Our experimental results clearly show that the highest quality NMT and statistical machine translation (SMT) performances are obtained with Syllable-BPE segmentation for both types of translations. If we focus on NMT, we find that the transformer with Word-BPE segmentation outperforms CNN and RNN for both Myanmar-Rakhine and Rakhine-Myanmar translation. However, CNN with Syllable-BPE segmentation obtains a higher score than the RNN and transformer.

2018

pdf
UCSYNLP-Lab Machine Translation Systems for WAT 2018
Yi Mon Shwe Sin | Thazin Myint Oo | Hsu Myat Mo | Win Pa Pa | Khim Mar Soe | Ye Kyaw Thu
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 5th Workshop on Asian Translation: 5th Workshop on Asian Translation

2016

pdf
Introducing the Asian Language Treebank (ALT)
Ye Kyaw Thu | Win Pa Pa | Masao Utiyama | Andrew Finch | Eiichiro Sumita
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper introduces the ALT project initiated by the Advanced Speech Translation Research and Development Promotion Center (ASTREC), NICT, Kyoto, Japan. The aim of this project is to accelerate NLP research for Asian languages such as Indonesian, Japanese, Khmer, Laos, Malay, Myanmar, Philippine, Thai and Vietnamese. The original resource for this project was English articles that were randomly selected from Wikinews. The project has so far created a corpus for Myanmar and will extend in scope to include other languages in the near future. A 20000-sentence corpus of Myanmar that has been manually translated from an English corpus has been word segmented, word aligned, part-of-speech tagged and constituency parsed by human annotators. In this paper, we present the implementation steps for creating the treebank in detail, including a description of the ALT web-based treebanking tool. Moreover, we report statistics on the annotation quality of the Myanmar treebank created so far.

pdf
Interlocking Phrases in Phrase-based Statistical Machine Translation
Ye Kyaw Thu | Andrew Finch | Eiichiro Sumita
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Comparison of Grapheme-to-Phoneme Conversion Methods on a Myanmar Pronunciation Dictionary
Ye Kyaw Thu | Win Pa Pa | Yoshinori Sagisaka | Naoto Iwahashi
Proceedings of the 6th Workshop on South and Southeast Asian Natural Language Processing (WSSANLP2016)

Grapheme-to-Phoneme (G2P) conversion is the task of predicting the pronunciation of a word given its graphemic or written form. It is a highly important part of both automatic speech recognition (ASR) and text-to-speech (TTS) systems. In this paper, we evaluate seven G2P conversion approaches: Adaptive Regularization of Weight Vectors (AROW) based structured learning (S-AROW), Conditional Random Field (CRF), Joint-sequence models (JSM), phrase-based statistical machine translation (PBSMT), Recurrent Neural Network (RNN), Support Vector Machine (SVM) based point-wise classification, Weighted Finite-state Transducers (WFST) on a manually tagged Myanmar phoneme dictionary. The G2P bootstrapping experimental results were measured with both automatic phoneme error rate (PER) calculation and also manual checking in terms of voiced/unvoiced, tones, consonant and vowel errors. The result shows that CRF, PBSMT and WFST approaches are the best performing methods for G2P conversion on Myanmar language.

2015

pdf
A Large-scale Study of Statistical Machine Translation Methods for Khmer Language
Ye Kyaw Thu | Vichet Chea | Andrew Finch | Masao Utiyama | Eiichiro Sumita
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation

2014

pdf
Integrating Dictionaries into an Unsupervised Model for Myanmar Word Segmentation
Ye Kyaw Thu | Andrew Finch | Eiichiro Sumita | Yoshinori Sagisaka
Proceedings of the Fifth Workshop on South and Southeast Asian Natural Language Processing

pdf
Empircal dependency-based head finalization for statistical Chinese-, English-, and French-to-Myanmar (Burmese) machine translation
Chenchen Ding | Ye Kyaw Thu | Masao Utiyama | Andrew Finch | Eiichiro Sumita
Proceedings of the 11th International Workshop on Spoken Language Translation: Papers

We conduct dependency-based head finalization for statistical machine translation (SMT) for Myanmar (Burmese). Although Myanmar is an understudied language, linguistically it is a head-final language with similar syntax to Japanese and Korean. So, applying the efficient techniques of Japanese and Korean processing to Myanmar is a natural idea. Our approach is a combination of two approaches. The first is a head-driven phrase structure grammar (HPSG) based head finalization for English-to-Japanese translation, the second is dependency-based pre-ordering originally designed for English-to-Korean translation. We experiment on Chinese-, English-, and French-to-Myanmar translation, using a statistical pre-ordering approach as a comparison method. Experimental results show the dependency-based head finalization was able to consistently improve a baseline SMT system, for different source languages and different segmentation schemes for the Myanmar language.