Abstract
Despite the lack of comprehensive exploration of emotional connotations, sentiment analysis, which categorizes data as positive or negative, has been widely employed to identify emotional aspects in texts. Recently, corpora labeled with more than just valence or polarity have been built to surpass this limitation. However, most Korean emotion corpora are limited by their small size and narrow range of emotions covered. In this paper, we introduce the KOTE dataset. The KOTE dataset comprises 50,000 Korean online comments, totaling 250,000 cases, each manually labeled for 43 emotions and NO EMOTION through crowdsourcing. The taxonomy for the 43 emotions was systematically derived through cluster analysis of Korean emotion concepts within the word embedding space. After detailing the development of KOTE, we further discuss the results of fine-tuning, as well as analysis for social discrimination within the corpus.- Anthology ID:
- 2024.lrec-main.1499
- Volume:
- Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
- Venues:
- LREC | COLING
- SIG:
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 17254–17270
- Language:
- URL:
- https://preview.aclanthology.org/remove-affiliations/2024.lrec-main.1499/
- DOI:
- Cite (ACL):
- Duyoung Jeon, Junho Lee, and Cheongtag Kim. 2024. User Guide for KOTE: Korean Online That-gul Emotions Dataset. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 17254–17270, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- User Guide for KOTE: Korean Online That-gul Emotions Dataset (Jeon et al., LREC-COLING 2024)
- PDF:
- https://preview.aclanthology.org/remove-affiliations/2024.lrec-main.1499.pdf