Huyen Nguyen


2024

pdf
ViHealthNLI: A Dataset for Vietnamese Natural Language Inference in Healthcare
Huyen Nguyen | Quyen The Ngo | Thanh-Ha Do | Tuan-Anh Hoang
Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024

This paper introduces ViHealthNLI, a large dataset for the natural language inference problem for Vietnamese. Unlike the similar Vietnamese datasets, ours is specific to the healthcare domain. We conducted an exploratory analysis to characterize the dataset and evaluated the state-of-the-art methods on the dataset. Our findings indicate that the dataset poses significant challenges while also holding promise for further advanced research and the creation of practical applications.

2022

pdf
Universal Proposition Bank 2.0
Ishan Jindal | Alexandre Rademaker | Michał Ulewicz | Ha Linh | Huyen Nguyen | Khoi-Nguyen Tran | Huaiyu Zhu | Yunyao Li
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Semantic role labeling (SRL) represents the meaning of a sentence in the form of predicate-argument structures. Such shallow semantic analysis is helpful in a wide range of downstream NLP tasks and real-world applications. As treebanks enabled the development of powerful syntactic parsers, the accurate predicate-argument analysis demands training data in the form of propbanks. Unfortunately, most languages simply do not have corresponding propbanks due to the high cost required to construct such resources. To overcome such challenges, Universal Proposition Bank 1.0 (UP1.0) was released in 2017, with high-quality propbank data generated via a two-stage method exploiting monolingual SRL and multilingual parallel data. In this paper, we introduce Universal Proposition Bank 2.0 (UP2.0), with significant enhancements over UP1.0: (1) propbanks with higher quality by using a state-of-the-art monolingual SRL and improved auto-generation of annotations; (2) expanded language coverage (from 7 to 9 languages); (3) span annotation for the decoupling of syntactic analysis; and (4) Gold data for a subset of the languages. We also share our experimental results that confirm the significant quality improvements of the generated propbanks. In addition, we present a comprehensive experimental evaluation on how different implementation choices impact the quality of the resulting data. We release these resources to the research community and hope to encourage more research on cross-lingual SRL.

2020

pdf
ReINTEL: A Multimodal Data Challenge for Responsible Information Identification on Social Network Sites
Duc-Trong Le | Xuan-Son Vu | Nhu-Dung To | Huu-Quang Nguyen | Thuy-Trinh Nguyen | Thi Khanh-Linh Le | Anh-Tuan Nguyen | Minh-Duc Hoang | Nghia Le | Huyen Nguyen | Hoang D. Nguyen
Proceedings of the 7th International Workshop on Vietnamese Language and Speech Processing

2019

pdf
A Case Study on Meaning Representation for Vietnamese
Ha Linh | Huyen Nguyen
Proceedings of the First International Workshop on Designing Meaning Representations

This paper presents a case study on meaning representation for Vietnamese. Having introduced several existing semantic representation schemes for different languages, we select as basis for our work on Vietnamese AMR (Abstract Meaning Representation). From it, we define a meaning representation label set by adapting the English schema and taking into account the specific characteristics of Vietnamese.