Thepchai Supnithi


NECTEC’s Participation in WAT-2021
Zar Zar Hlaing | Ye Kyaw Thu | Thazin Myint Oo | Mya Ei San | Sasiporn Usanavasin | Ponrudee Netisopakul | Thepchai Supnithi
Proceedings of the 8th Workshop on Asian Translation (WAT2021)

In this paper, we report the experimental results of Machine Translation models conducted by a NECTEC team for the translation tasks of WAT-2021. Basically, our models are based on neural methods for both directions of English-Myanmar and Myanmar-English language pairs. Most of the existing Neural Machine Translation (NMT) models mainly focus on the conversion of sequential data and do not directly use syntactic information. However, we conduct multi-source neural machine translation (NMT) models using the multilingual corpora such as string data corpus, tree data corpus, or POS-tagged data corpus. The multi-source translation is an approach to exploit multiple inputs (e.g. in two different formats) to increase translation accuracy. The RNN-based encoder-decoder model with attention mechanism and transformer architectures have been carried out for our experiment. The experimental results showed that the proposed models of RNN-based architecture outperform the baseline model for English-to-Myanmar translation task, and the multi-source and shared-multi-source transformer models yield better translation results than the baseline.


Statistical Machine Translation between Myanmar (Burmese) and Dawei (Tavoyan)
Thazin Myint Oo | Ye Kyaw Thu | Khin Mar Soe | Thepchai Supnithi
Proceedings of the First International Workshop on NLP Solutions for Under Resourced Languages (NSURL 2019) co-located with ICNLSP 2019 - Short Papers

String Similarity Measures for Myanmar Language (Burmese)
Khaing Hsu Wai | Ye Kyaw Thu | Hnin Aye Thant | Swe Zin Moe | Thepchai Supnithi
Proceedings of the First International Workshop on NLP Solutions for Under Resourced Languages (NSURL 2019) co-located with ICNLSP 2019 - Short Papers


pdf bib
Proceedings of the IJCNLP 2017, System Demonstrations
Seong-Bae Park | Thepchai Supnithi
Proceedings of the IJCNLP 2017, System Demonstrations


Improvement of Statistical Machine Translation using Charater-Based Segmentationwith Monolingual and Bilingual Information
Vipas Sutantayawalee | Peerachet Porkaew | Prachya Boonkwan | Sitthaa Phaholphinyo | Thepchai Supnithi
Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing

Character-Cluster-Based Segmentation using Monolingual and Bilingual Information for Statistical Machine Translation
Vipas Sutantayawalee | Peerachet Porkeaw | Thepchai Supnithi | Prachya Boonkwan | Sitthaa Phaholphinyo
Proceedings of the Fifth Workshop on South and Southeast Asian Natural Language Processing


Automatic Transformation of the Thai Categorial Grammar Treebank to Dependency Trees
Christian Rishøj | Taneth Ruangrajitpakorn | Prachya Boonkwan | Thepchai Supnithi
Proceedings of 5th International Joint Conference on Natural Language Processing


A Supervised Learning based Chunking in Thai using Categorial Grammar
Thepchai Supnithi | Chanon Onman | Peerachet Porkaew | Taneth Ruangrajitpakorn | Kanokorn Trakultaweekool | Asanee Kawtrakul
Proceedings of the Eighth Workshop on Asian Language Resouces

A Current Status of Thai Categorial Grammars and Their Applications
Taneth Ruangrajitpakorn | Thepchai Supnithi
Proceedings of the Eighth Workshop on Asian Language Resouces

AutoTagTCG : A Framework for Automatic Thai CG Tagging
Thepchai Supnithi | Taneth Ruangrajitpakorn | Kanokorn Trakultaweekool | Peerachet Porkaew
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper aims to develop a framework for automatic CG tagging. We investigated two main algorithms, CRF and Statistical alignment model based on information theory (SAM). We found that SAM gives the best results both in word level and sentence level. We got the accuracy 89.25% in word level and 82.49% in sentence level. Combining both methods can be suited for both known and unknown word.


A Syntactic Resource for Thai: CG Treebank
Taneth Ruangrajitpakorn | Kanokorn Trakultaweekoon | Thepchai Supnithi
Proceedings of the 7th Workshop on Asian Language Resources (ALR7)


OpenCCG Workbench and Visualization Tool
Thepchai Supnithi | Suchinder Singh | Taneth Ruangrajitpakorn | Prachya Boonkwan | Monthika Boriboon
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Combinatorial Category Grammar is (CCG) a lexicalized grammar formalism which is expressed by syntactic category, a logical form representation. There are difficulties in representing CCG without any visualization tools. This paper presents a design framework of OpenCCG workbench and visualization tool which enables linguists to develop CCG based lexicons more easily. Our research is aimed to resolve these gaps by developing a user-friendly tool. OpenCCG Workbench, an open source web-based environment, was developed to enable multiple users to visually create and update grammars for using with the OpenCCG library. It was designed to streamline and speed-up the lexicon building process, and to free the linguists from writing XML files which is both cumbersome and error-prone. The system consists of three sub-systems: grammar management system, grammar validator system, and concordance retrieval system. In this paper we will mainly discuss the most important parts, grammar management and validation systems, which are directly related to a CCG lexicon construction. We support users in three levels; Expert linguists who play a role as lexical entry designer, normal linguists who adds or edits lexicons, and guests who requires an acquisition to the lexicon into their applications.

Memory-Inductive Categorial Grammar: An Approach to Gap Resolution in Analytic-Language Translation
Prachya Boonkwan | Thepchai Supnithi
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

pdf bib
Speech-to-Speech Translation Activities in Thailand
Chai Wutiwiwatchai | Thepchai Supnithi | Krit Kosawat
Proceedings of the Workshop on Technologies and Corpora for Asia-Pacific Speech Translation (TCAST)


A Practical of Memory-based Approach for Improving Accuracy of MT
Sitthaa Phaholphinyo | Teerapong Modhiran | Nattapol Kritsuthikul | Thepchai Supnithi
Proceedings of Machine Translation Summit X: Papers

Rule-Based Machine Translation (RBMT) [1] approach is a major approach in MT research. It needs linguistic knowledge to create appropriate rules of translation. However, we cannot completely add all linguistic rules to the system because adding new rules may cause a conflict with the old ones. So, we propose a memory based approach to improve the translation quality without modifying the existing linguistic rules. This paper analyses the translation problems and shows how this approach works.

PARSIT-TE: Online Thai-English Machine Translation
Teerapong Modhiran | Krit Kosawat | Supon Klaithin | Monthika Boriboon | Thepchai Supnithi
Proceedings of Machine Translation Summit X: Posters

This paper presents an online Thai-English MT system, called PARSITTE, which is an extension of PARSIT English-Thai one. We aim to assist foreigners and Thai in exchanging more easily their information. The system is a rule-based and Interlingua approach. To improve the system, we concentrate on pre-processing and rule analysis phases, which are considered necessary because of some specific problems of Thai language.


Automatic Error Detection in the Japanese Learners’ English Spoken Data
Emi Izumi | Kiyotaka Uchimoto | Toyomi Saiga | Thepchai Supnithi | Hitoshi Isahara
The Companion Volume to the Proceedings of 41st Annual Meeting of the Association for Computational Linguistics


A Cross System Machine Translation
Thepchai Supnithi | Virach Sornlertlamvanich | Thatsanee Charoenporn
COLING-02: Machine Translation in Asia