BrightCookies at SemEval-2025 Task 9: Exploring Data Augmentation for Food Hazard Classification
Foteini Papadopoulou, Osman Mutlu, Neris Özen, Bas Van Der Velden, Iris Hendrickx, Ali Hurriyetoglu
Abstract
This paper presents our system developed for the SemEval-2025 Task 9: The Food Hazard Detection Challenge. The shared task’s objective is to evaluate explainable classification systems for classifying hazards and products in two levels of granularity from web-collected food recall incident reports. In this work, we propose text augmentation techniques as a way to improve poor performance in minority classes and compare their effect for each category on various transformer and machine learning models. We apply three word-level data augmentation techniques, namely synonym replacement, random word swapping, and contextual word insertion utilizing BERT. The resultsshow that transformer models tend to have a better overall performance. Meanwhile, a statistically significant improvement (P 0.05) was observed in the fine-grained categories when using BERT to compare the baseline model with the three augmented models, which achieved a 6% increase in correct predictions for minority hazard classes. This suggests that targeted augmentation of minority classes can improve the performance of transformer models.- Anthology ID:
- 2025.semeval-1.124
- Volume:
- Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
- Month:
- July
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Sara Rosenthal, Aiala Rosá, Debanjan Ghosh, Marcos Zampieri
- Venues:
- SemEval | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 914–930
- Language:
- URL:
- https://preview.aclanthology.org/corrections-2025-08/2025.semeval-1.124/
- DOI:
- Cite (ACL):
- Foteini Papadopoulou, Osman Mutlu, Neris Özen, Bas Van Der Velden, Iris Hendrickx, and Ali Hurriyetoglu. 2025. BrightCookies at SemEval-2025 Task 9: Exploring Data Augmentation for Food Hazard Classification. In Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), pages 914–930, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- BrightCookies at SemEval-2025 Task 9: Exploring Data Augmentation for Food Hazard Classification (Papadopoulou et al., SemEval 2025)
- PDF:
- https://preview.aclanthology.org/corrections-2025-08/2025.semeval-1.124.pdf