Tin Van Huynh


2025

pdf bib
ViNumFCR: A Novel Vietnamese Benchmark for Numerical Reasoning Fact Checking on Social Media News
Nhi Ngoc Phuong Luong | Anh Thi Lan Le | Tin Van Huynh | Kiet Van Nguyen | Ngan Nguyen
Proceedings of the 18th International Natural Language Generation Conference

In the digital era, the internet provides rapid and convenient access to vast amounts of information. However, much of this information remains unverified, particularly with the increasing prevalence of falsified numerical data, leading to public confusion and negative societal impacts. To address this issue, we developed ViNumFCR, a first dataset dedicated to fact-checking numerical information in Vietnamese. Comprising over 10,000 samples collected and constructed from online newspaper across 12 different topics. We assessed the performance of various fact-checking models, including Pretrained Language Models and Large Language Models, alongside retrieval techniques for gathering supporting evidence. Experimental results demonstrate that the XLM-R_Large model achieved the highest accuracy of 90.05% on the fact-checking task, while the combined SBERT + BM25 model attained a precision of over 97% on the evidence retrieval task. Additionally, we conducted an in-depth analysis of the linguistic features of the dataset to understand the factors influencing the performance models. The ViNumFCR dataset is publicly available to support further research.

pdf bib
Reading the Signs: A Graph-Based System for Multimodal Information Retrieval on Vietnamese Traffic Law
Hieu Minh Huynh | An Nguyen Tran Khuong | Dai Phan Trong | Tin Van Huynh
Proceedings of the 11th International Workshop on Vietnamese Language and Speech Processing

2023

pdf bib
Machine Reading Comprehension for Vietnamese Customer Reviews: Task, Corpus and Baseline Models
Tinh Pham Phuc Do | Ngoc Dinh Duy Cao | Nhan Thanh Nguyen | Tin Van Huynh | Kiet Van Nguyen
Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation

2022

pdf bib
ViNLI: A Vietnamese Corpus for Studies on Open-Domain Natural Language Inference
Tin Van Huynh | Kiet Van Nguyen | Ngan Luu-Thuy Nguyen
Proceedings of the 29th International Conference on Computational Linguistics

Over a decade, the research field of computational linguistics has witnessed the growth of corpora and models for natural language inference (NLI) for rich-resource languages such as English and Chinese. A large-scale and high-quality corpus is necessary for studies on NLI for Vietnamese, which can be considered a low-resource language. In this paper, we introduce ViNLI (Vietnamese Natural Language Inference), an open-domain and high-quality corpus for evaluating Vietnamese NLI models, which is created and evaluated with a strict process of quality control. ViNLI comprises over 30,000 human-annotated premise-hypothesis sentence pairs extracted from more than 800 online news articles on 13 distinct topics. In this paper, we introduce the guidelines for corpus creation which take the specific characteristics of the Vietnamese language in expressing entailment and contradiction into account. To evaluate the challenging level of our corpus, we conduct experiments with state-of-the-art deep neural networks and pre-trained models on our dataset. The best system performance is still far from human performance (a 14.20% gap in accuracy). The ViNLI corpus is a challenging corpus to accelerate progress in Vietnamese computational linguistics. Our corpus is available publicly for research purposes.