Phuong Le Hong

Also published as: H. Phuong Le, Hong-Phuong Le, Hồng Phương , Phuong Le-Hong, Phương Lê Hồng


2024

pdf
Improving Multi-Label Classification of Similar Languages by Semantics-Aware Word Embeddings
The Ngo | Thi Anh Nguyen | My Ha | Thi Minh Nguyen | Phuong Le-Hong
Proceedings of the Eleventh Workshop on NLP for Similar Languages, Varieties, and Dialects (VarDial 2024)

The VLP team participated in the DSL-ML shared task of the VarDial 2024 workshop which aims to distinguish texts in similar languages. This paper presents our approach to solving the problem and discusses our experimental and official results. We propose to integrate semantics-aware word embeddings which are learned from ConceptNet into a bidirectional long short-term memory network. This approach achieves good performance – our sys- tem is ranked in the top two or three of the best performing teams for the task.

2023

pdf
Two Neural Models for Multilingual Grammatical Error Detection
Phuong Le-Hong | The Quyen Ngo | Thi Minh Huyen Nguyen
Proceedings of the 12th Workshop on NLP for Computer Assisted Language Learning

2020

pdf bib
Improving Sequence Tagging for Vietnamese Text using Transformer-based Neural Models
The Viet Bui | Thi Oanh Tran | Phuong Le-Hong
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation

2017

pdf
The Importance of Automatic Syntactic Features in Vietnamese Named Entity Recognition
Thai-Hoang Pham | Phuong Le-Hong
Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation

pdf
NNVLP: A Neural Network-Based Vietnamese Language Processing Toolkit
Thai-Hoang Pham | Xuan-Khoai Pham | Tuan-Anh Nguyen | Phuong Le-Hong
Proceedings of the IJCNLP 2017, System Demonstrations

This paper demonstrates neural network-based toolkit namely NNVLP for essential Vietnamese language processing tasks including part-of-speech (POS) tagging, chunking, Named Entity Recognition (NER). Our toolkit is a combination of bidirectional Long Short-Term Memory (Bi-LSTM), Convolutional Neural Network (CNN), Conditional Random Field (CRF), using pre-trained word embeddings as input, which outperforms previously published toolkits on these three tasks. We provide both of API and web demo for this toolkit.

2010

pdf
Automated Extraction of Tree Adjoining Grammars from a Treebank for Vietnamese
Phuong Le-Hong | Thi Minh Huyen Nguyen | Phuong Thai Nguyen | Azim Roussanaly
Proceedings of the 10th International Workshop on Tree Adjoining Grammar and Related Frameworks (TAG+10)

pdf
An empirical study of maximum entropy approach for part-of-speech tagging of Vietnamese texts
Phuong Le-Hong | Azim Roussanaly | Thi Minh Huyen Nguyen | Mathias Rossignol
Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

This paper presents an empirical study on the application of the maximum entropy approach for part-of-speech tagging of Vietnamese text, a language with special characteristics which largely distinguish it from occidental languages. Our best tagger explores and includes useful knowledge sources for tagging Vietnamese text and gives a 93.40%overall accuracy and a 80.69%unknown word accuracy on a test set of the Vietnamese treebank. Our tagger significantly outperforms the tagger that is being used for building the Vietnamese treebank, and as far as we are aware, this is the best tagging result ever published for the Vietnamese language.

2009

pdf
Building a Large Syntactically-Annotated Corpus of Vietnamese
Phuong-Thai Nguyen | Xuan-Luong Vu | Thi-Minh-Huyen Nguyen | Van-Hiep Nguyen | Hong-Phuong Le
Proceedings of the Third Linguistic Annotation Workshop (LAW III)

pdf
Finite-State Description of Vietnamese Reduplication
Phuong Le Hong | Thi Minh Huyen Nguyen | Azim Roussanaly
Proceedings of the 7th Workshop on Asian Language Resources (ALR7)

2008

pdf
Word Segmentation of Vietnamese Texts: a Comparison of Approaches
Quang Thắng Đinh | Hồng Phương Lê | Thị Minh Huyền Nguyễn | Cẩm Tú Nguyễn | Mathias Rossignol | Xuân Lương Vũ
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We present in this paper a comparison between three segmentation systems for the Vietnamese language. Indeed, the majority of Vietnamese words is built by semantic composition from about 7,000 syllables, which also have a meaning as isolated words. So the identification of word boundaries in a text is not a simple task, and ambiguities often appear. Beyond the presentation of the tested systems, we also propose a standard definition for word segmentation in Vietnamese, and introduce a reference corpus developed for the purpose of evaluating such a task. The results observed confirm that it can be relatively well treated by automatic means, although a solution needs to be found to take into account out-of-vocabulary words.

pdf
A Metagrammar for Vietnamese LTAG
Phương Lê Hồng | Thị Minh Huyền Nguyễn | Azim Roussanaly
Proceedings of the Ninth International Workshop on Tree Adjoining Grammar and Related Frameworks (TAG+9)

2006

pdf
A Lexicalized Tree-Adjoining Grammar for Vietnamese
H. Phuong Le | T. M. Huyen Nguyen | Laurent Romary | Azim Roussanaly
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper, we present the first sizable grammar built for Vietnamese using LTAG, developed over the past two years, named vnLTAG. This grammar aims at modelling written language and is general enough to be both application- and domain-independent. It can be used for the morpho-syntactic tagging and syntactic parsing of Vietnamese texts, as well as text generation. We then present a robust parsing scheme using vnLTAG and a parser for the grammar. We finish with an evaluation using a test suite.