Xuan Luong Vu

Also published as: Xuan-Luong Vu, Xuân Lương


Building a Large Syntactically-Annotated Corpus of Vietnamese
Phuong-Thai Nguyen | Xuan-Luong Vu | Thi-Minh-Huyen Nguyen | Van-Hiep Nguyen | Hong-Phuong Le
Proceedings of the Third Linguistic Annotation Workshop (LAW III)


Word Segmentation of Vietnamese Texts: a Comparison of Approaches
Quang Thắng Đinh | Hồng Phương Lê | Thị Minh Huyền Nguyễn | Cẩm Tú Nguyễn | Mathias Rossignol | Xuân Lương Vũ
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We present in this paper a comparison between three segmentation systems for the Vietnamese language. Indeed, the majority of Vietnamese words is built by semantic composition from about 7,000 syllables, which also have a meaning as isolated words. So the identification of word boundaries in a text is not a simple task, and ambiguities often appear. Beyond the presentation of the tested systems, we also propose a standard definition for word segmentation in Vietnamese, and introduce a reference corpus developed for the purpose of evaluating such a task. The results observed confirm that it can be relatively well treated by automatic means, although a solution needs to be found to take into account out-of-vocabulary words.


Developping Tools and Building Linguistic Resources for Vietnamese Morpho-syntactic Processing
Thanh Bon Nguyen | Thi Minh Huyen Nguyen | Laurent Romary | Xuan Luong Vu
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)