Thanaruk Theeramunkong


2023

pdf
Enhancing Translation of Myanmar Sign Language by Transfer Learning and Self-Training
Hlaing Myat Nwe | Kiyoaki Shirai | Natthawut Kertkeidkachorn | Thanaruk Theeramunkong | Ye Kyaw Thu | Thepchai Supnithi | Natsuda Kaothanthong
Proceedings of Machine Translation Summit XIX, Vol. 1: Research Track

This paper proposes a method to develop a machine translation (MT) system from Myanmar Sign Language (MSL) to Myanmar Written Language (MWL) and vice versa for the deaf community. Translation of MSL is a difficult task since only a small amount of a parallel corpus between MSL and MWL is available. To address the challenge for MT of the low-resource language, transfer learning is applied. An MT model is trained first for a high-resource language pair, American Sign Language (ASL) and English, then it is used as an initial model to train an MT model between MSL and MWL. The mT5 model is used as a base MT model in this transfer learning. Additionally, a self-training technique is applied to generate synthetic translation pairs of MSL and MWL from a large monolingual MWL corpus. Furthermore, since the segmentation of a sentence is required as preprocessing of MT for the Myanmar language, several segmentation schemes are empirically compared. Results of experiments show that both transfer learning and self-training can enhance the performance of the translation between MSL and MWL compared with a baseline model fine-tuned from a small MSL-MWL parallel corpus only.

2011

pdf
Multi-stage Annotation using Pattern-based and Statistical-based Techniques for Automatic Thai Annotated Corpus Construction
Nattapong Tongtep | Thanaruk Theeramunkong
Proceedings of the 9th Workshop on Asian Language Resources

2009

pdf
QAST: Question Answering System for ThaiWikipedia
Wittawat Jitkrittum | Choochart Haruechaiyasak | Thanaruk Theeramunkong
Proceedings of the 2009 Workshop on Knowledge and Reasoning for Answering Questions (KRAQ 2009)

2006

pdf
Word Knowledge Acquisition for Computational Lexicon Construction
Thatsanee Charoenporn | Canasai Kruengkrai | Thanaruk Theeramunkong | Virach Sornlertlamvanich | Hitoshi Isahara
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The growing of multilingual information processing technology has created the need of linguistic resources, especially lexical database. Many attempts were put to alter the traditional dictionary to computational dictionary, or widely named as computational lexicon. TCL’s Computational Lexicon (TCLLEX) is a recent development of a large-scale Thai Lexicon, which aims to serve as a fundamental linguistic resource for natural language processing research. We design either terminology or ontology for structuring the lexicon based on the idea of computability and reusability.

2004

pdf
Thai Spelling Recognition Using a Continuous Speech Corpus
Chutima Pisarn | Thanaruk Theeramunkong
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

2002

pdf
Multi-Dimensional Text Classification
Thanaruk Theeramunkong | Verayuth Lertnattee
COLING 2002: The 19th International Conference on Computational Linguistics

2001

pdf
Non-Dictionary-Based Thai Word Segmentation Using Decision Trees
Thanaruk Theeramunkong | Sasiporn Usanavasin
Proceedings of the First International Conference on Human Language Technology Research

pdf
A Structure-Shared Trie Compression Method
Thanasan Tanhermhong | Thanaruk Theeramunkong | Wirat Chinnan
Proceedings of the 15th Pacific Asia Conference on Language, Information and Computation

1997

pdf
Grammar Acquisition Based on Clustering Analysis and Its Application to Statistical Parsing
Thanaruk Theeramunkong | Manabu Okumura
Fifth Workshop on Very Large Corpora

pdf
Exploiting Contextual Information in Hypothesis Selection for Grammar Refinement
Thanaruk Theeramunkong | Yasunobu Kawaguchi | Manabu Okumura
Computational Environments for Grammar Development and Linguistic Engineering

1996

pdf
Towards Automatic Grammar Acquisition from a Bracketed Corpus
Thanaruk Theeramunkong | Manabu Okumara
Fourth Workshop on Very Large Corpora