BrightCookies at SemEval-2025 Task 9: Exploring Data Augmentation for Food Hazard Classification

Foteini Papadopoulou; Osman Mutlu; Neris Özen; Bas Van Der Velden; Iris Hendrickx; Ali Hürriyetoğlu

BrightCookies at SemEval-2025 Task 9: Exploring Data Augmentation for Food Hazard Classification

Foteini Papadopoulou, Osman Mutlu, Neris Özen, Bas Van Der Velden, Iris Hendrickx, Ali Hurriyetoglu

Abstract

This paper presents our system developed for the SemEval-2025 Task 9: The Food Hazard Detection Challenge. The shared task’s objective is to evaluate explainable classification systems for classifying hazards and products in two levels of granularity from web-collected food recall incident reports. In this work, we propose text augmentation techniques as a way to improve poor performance in minority classes and compare their effect for each category on various transformer and machine learning models. We apply three word-level data augmentation techniques, namely synonym replacement, random word swapping, and contextual word insertion utilizing BERT. The resultsshow that transformer models tend to have a better overall performance. Meanwhile, a statistically significant improvement (P 0.05) was observed in the fine-grained categories when using BERT to compare the baseline model with the three augmented models, which achieved a 6% increase in correct predictions for minority hazard classes. This suggests that targeted augmentation of minority classes can improve the performance of transformer models.

Anthology ID:: 2025.semeval-1.124
Volume:: Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Sara Rosenthal, Aiala Rosá, Debanjan Ghosh, Marcos Zampieri
Venues:: SemEval | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 914–930
Language:
URL:: https://preview.aclanthology.org/corrections-2025-08/2025.semeval-1.124/
DOI:
Bibkey:
Cite (ACL):: Foteini Papadopoulou, Osman Mutlu, Neris Özen, Bas Van Der Velden, Iris Hendrickx, and Ali Hurriyetoglu. 2025. BrightCookies at SemEval-2025 Task 9: Exploring Data Augmentation for Food Hazard Classification. In Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), pages 914–930, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: BrightCookies at SemEval-2025 Task 9: Exploring Data Augmentation for Food Hazard Classification (Papadopoulou et al., SemEval 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/corrections-2025-08/2025.semeval-1.124.pdf

PDF Cite Search Fix data