Abstract
We present two novel datasets for the low-resource language Vietnamese to assess models of semantic similarity: ViCon comprises pairs of synonyms and antonyms across word classes, thus offering data to distinguish between similarity and dissimilarity. ViSim-400 provides degrees of similarity across five semantic relations, as rated by human judges. The two datasets are verified through standard co-occurrence and neural network models, showing results comparable to the respective English datasets.- Anthology ID:
- N18-2032
- Volume:
- Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)
- Month:
- June
- Year:
- 2018
- Address:
- New Orleans, Louisiana
- Editors:
- Marilyn Walker, Heng Ji, Amanda Stent
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 199–205
- Language:
- URL:
- https://aclanthology.org/N18-2032
- DOI:
- 10.18653/v1/N18-2032
- Cite (ACL):
- Kim Anh Nguyen, Sabine Schulte im Walde, and Ngoc Thang Vu. 2018. Introducing Two Vietnamese Datasets for Evaluating Semantic Models of (Dis-)Similarity and Relatedness. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 199–205, New Orleans, Louisiana. Association for Computational Linguistics.
- Cite (Informal):
- Introducing Two Vietnamese Datasets for Evaluating Semantic Models of (Dis-)Similarity and Relatedness (Nguyen et al., NAACL 2018)
- PDF:
- https://preview.aclanthology.org/naacl24-info/N18-2032.pdf