2022
pdf
abs
KC4MT: A High-Quality Corpus for Multilingual Machine Translation
Vinh Van Nguyen
|
Ha Nguyen
|
Huong Thanh Le
|
Thai Phuong Nguyen
|
Tan Van Bui
|
Luan Nghia Pham
|
Anh Tuan Phan
|
Cong Hoang-Minh Nguyen
|
Viet Hong Tran
|
Anh Huu Tran
Proceedings of the Thirteenth Language Resources and Evaluation Conference
The multilingual parallel corpus is an important resource for many applications of natural language processing (NLP). For machine translation, the size and quality of the training corpus mainly affects the quality of the translation models. In this work, we present the method for building high-quality multilingual parallel corpus in the news domain and for some low-resource languages, including Vietnamese, Laos, and Khmer, to improve the quality of multilingual machine translation in these areas. We also publicized this one that includes 500.000 Vietnamese-Chinese bilingual sentence pairs; 150.000 Vietnamese-Laos bilingual sentence pairs, and 150.000 Vietnamese-Khmer bilingual sentence pairs.
2016
pdf
abs
A Two-Phase Approach for Building Vietnamese WordNet
Thai Phuong Nguyen
|
Van-Lam Pham
|
Hoang-An Nguyen
|
Huy-Hien Vu
|
Ngoc-Anh Tran
|
Thi-Thu-Ha Truong
Proceedings of the 8th Global WordNet Conference (GWC)
Wordnets play an important role not only in linguistics but also in natural language processing (NLP). This paper reports major results of a project which aims to construct a wordnet for Vietnamese language. We propose a two-phase approach to the construction of Vietnamese WordNet employing available language resources and ensuring Vietnamese specific linguistic and cultural characteristics. We also give statistical results and analyses to show characteristics of the wordnet.
2009
pdf
Improving a Lexicalized Hierarchical Reordering Model Using Maximum Entropy
Vinh Van Nguyen
|
Akira Shimazu
|
Minh Le Nguyen
|
Thai Phuong Nguyen
Proceedings of Machine Translation Summit XII: Papers
2008
pdf
A Tree-to-String Phrase-based Model for Statistical Machine Translation
Thai Phuong Nguyen
|
Akira Shimazu
|
Tu-Bao Ho
|
Minh Le Nguyen
|
Vinh Van Nguyen
CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning
2006
pdf
abs
Improving Phrase-Based Statistical Machine Translation with Morpho-Syntactic Analysis and Transformation
Thai Phuong Nguyen
|
Akira Shimazu
Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers
This paper presents our study of exploiting morpho-syntactic information for phrase-based statistical machine translation (SMT). For morphological transformation, we use hand-crafted transformational rules. For syntactic transformation, we propose a transformational model based on Bayes’ formula. The model is trained using a bilingual corpus and a broad coverage parser of the source language. The morphological and syntactic transformations are used in the preprocessing phase of a SMT system. This preprocessing method is applicable to language pairs in which the target language is poor in resources. We applied the proposed method to translation from English to Vietnamese. Our experiments showed a BLEU-score improvement of more than 3.28% in comparison with Pharaoh, a state-of-the-art phrase-based SMT system.