Abstract
This paper introduces ViHealthNLI, a large dataset for the natural language inference problem for Vietnamese. Unlike the similar Vietnamese datasets, ours is specific to the healthcare domain. We conducted an exploratory analysis to characterize the dataset and evaluated the state-of-the-art methods on the dataset. Our findings indicate that the dataset poses significant challenges while also holding promise for further advanced research and the creation of practical applications.- Anthology ID:
- 2024.sigul-1.48
- Volume:
- Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Maite Melero, Sakriani Sakti, Claudia Soria
- Venues:
- SIGUL | WS
- SIG:
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 404–409
- Language:
- URL:
- https://aclanthology.org/2024.sigul-1.48
- DOI:
- Cite (ACL):
- Huyen Nguyen, Quyen The Ngo, Thanh-Ha Do, and Tuan-Anh Hoang. 2024. ViHealthNLI: A Dataset for Vietnamese Natural Language Inference in Healthcare. In Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024, pages 404–409, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- ViHealthNLI: A Dataset for Vietnamese Natural Language Inference in Healthcare (Nguyen et al., SIGUL-WS 2024)
- PDF:
- https://preview.aclanthology.org/landing_page/2024.sigul-1.48.pdf