Abstract
This paper presents our models for the Social Media Mining for Health 2024 shared task, specifically Task 5, which involves classifying tweets reporting a child with childhood disorders (annotated as “1”) versus those merely mentioning a disorder (annotated as “0”). We utilized a classification model enhanced with diverse textual and language model-based augmentations. To ensure quality, we used semantic similarity, perplexity, and lexical diversity as evaluation metrics. Combining supervised contrastive learning and cross-entropy-based learning, our best model, incorporating R-drop and various LM generation-based augmentations, achieved an impressive F1 score of 0.9230 on the test set, surpassing the task mean and median scores.- Anthology ID:
- 2024.smm4h-1.33
- Volume:
- Proceedings of The 9th Social Media Mining for Health Research and Applications (SMM4H 2024) Workshop and Shared Tasks
- Month:
- August
- Year:
- 2024
- Address:
- Bangkok, Thailand
- Editors:
- Dongfang Xu, Graciela Gonzalez-Hernandez
- Venues:
- SMM4H | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 142–145
- Language:
- URL:
- https://aclanthology.org/2024.smm4h-1.33
- DOI:
- Cite (ACL):
- Sumam Francis and Marie-Francine Moens. 2024. KUL@SMM4H2024: Optimizing Text Classification with Quality-Assured Augmentation Strategies. In Proceedings of The 9th Social Media Mining for Health Research and Applications (SMM4H 2024) Workshop and Shared Tasks, pages 142–145, Bangkok, Thailand. Association for Computational Linguistics.
- Cite (Informal):
- KUL@SMM4H2024: Optimizing Text Classification with Quality-Assured Augmentation Strategies (Francis & Moens, SMM4H-WS 2024)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2024.smm4h-1.33.pdf