Neris Özen


2025

pdf bib
BrightCookies at SemEval-2025 Task 9: Exploring Data Augmentation for Food Hazard Classification
Foteini Papadopoulou | Osman Mutlu | Neris Özen | Bas Van Der Velden | Iris Hendrickx | Ali Hurriyetoglu
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

This paper presents our system developed for the SemEval-2025 Task 9: The Food Hazard Detection Challenge. The shared task’s objective is to evaluate explainable classification systems for classifying hazards and products in two levels of granularity from web-collected food recall incident reports. In this work, we propose text augmentation techniques as a way to improve poor performance in minority classes and compare their effect for each category on various transformer and machine learning models. We apply three word-level data augmentation techniques, namely synonym replacement, random word swapping, and contextual word insertion utilizing BERT. The resultsshow that transformer models tend to have a better overall performance. Meanwhile, a statistically significant improvement (P 0.05) was observed in the fine-grained categories when using BERT to compare the baseline model with the three augmented models, which achieved a 6% increase in correct predictions for minority hazard classes. This suggests that targeted augmentation of minority classes can improve the performance of transformer models.