Kiet Van Nguyen


pdf bib
Revealing Weaknesses of Vietnamese Language Models Through Unanswerable Questions in Machine Reading Comprehension
Son Quoc Tran | Phong Nguyen-Thuan Do | Kiet Van Nguyen | Ngan Luu-Thuy Nguyen
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop

Although the curse of multilinguality significantly restricts the language abilities of multilingual models in monolingual settings, researchers now still have to rely on multilingual models to develop state-of-the-art systems in Vietnamese Machine Reading Comprehension. This difficulty in researching is because of the limited number of high-quality works in developing Vietnamese language models. In order to encourage more work in this research field, we present a comprehensive analysis of language weaknesses and strengths of current Vietnamese monolingual models using the downstream task of Machine Reading Comprehension. From the analysis results, we suggest new directions for developing Vietnamese language models. Besides this main contribution, we also successfully reveal the existence of artifacts in Vietnamese Machine Reading Comprehension benchmarks and suggest an urgent need for new high-quality benchmarks to track the progress of Vietnamese Machine Reading Comprehension. Moreover, we also introduced a minor but valuable modification to the process of annotating unanswerable questions for Machine Reading Comprehension from previous work. Our proposed modification helps improve the quality of unanswerable questions to a higher level of difficulty for Machine Reading Comprehension systems to solve.


ViNLI: A Vietnamese Corpus for Studies on Open-Domain Natural Language Inference
Tin Van Huynh | Kiet Van Nguyen | Ngan Luu-Thuy Nguyen
Proceedings of the 29th International Conference on Computational Linguistics

Over a decade, the research field of computational linguistics has witnessed the growth of corpora and models for natural language inference (NLI) for rich-resource languages such as English and Chinese. A large-scale and high-quality corpus is necessary for studies on NLI for Vietnamese, which can be considered a low-resource language. In this paper, we introduce ViNLI (Vietnamese Natural Language Inference), an open-domain and high-quality corpus for evaluating Vietnamese NLI models, which is created and evaluated with a strict process of quality control. ViNLI comprises over 30,000 human-annotated premise-hypothesis sentence pairs extracted from more than 800 online news articles on 13 distinct topics. In this paper, we introduce the guidelines for corpus creation which take the specific characteristics of the Vietnamese language in expressing entailment and contradiction into account. To evaluate the challenging level of our corpus, we conduct experiments with state-of-the-art deep neural networks and pre-trained models on our dataset. The best system performance is still far from human performance (a 14.20% gap in accuracy). The ViNLI corpus is a challenging corpus to accelerate progress in Vietnamese computational linguistics. Our corpus is available publicly for research purposes.


ViVQA: Vietnamese Visual Question Answering
Khanh Quoc Tran | An Trong Nguyen | An Tran-Hoai Le | Kiet Van Nguyen
Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation

Monolingual vs multilingual BERTology for Vietnamese extractive multi-document summarization
Huy Quoc To | Kiet Van Nguyen | Ngan Luu-Thuy Nguyen | Anh Gia-Tuan Nguyen
Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation


A Simple and Efficient Ensemble Classifier Combining Multiple Neural Network Models on Social Media Datasets in Vietnamese
Huy Duc Huynh | Hang Thi-Thuy Do | Kiet Van Nguyen | Ngan Thuy-Luu Nguyen
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation