Rabiraj Bandyopadhyay


2025

pdf bib
Enhancing Training Data Quality through Influence Scores for Generalizable Classification: A Case Study on Sexism Detection
Rabiraj Bandyopadhyay | Dennis Assenmacher | Jose Maria Alonso-Moral | Claudia Wagner
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

The quality of training data is crucial for the performance of supervised machine learning models. In particular, poor annotation quality and spurious correlations between labels and features in text dataset can significantly degrade model generalization. This problem is especially pronounced in harmful language detection, where prior studies have revealed major deficiencies in existing datasets. In this work, we design and test data selection methods based on learnability measures to improve dataset quality. Using a sexism dataset with counterfactuals designed to avoid spurious correlations, we show that pruning with EL2N and PVI scores can lead to significant performance increases and outperforms submodular and random selection. Our analysis reveals that in presence of label imbalance models rely on dataset shortcuts; especially easy-to-classify sexist instances and hard-to-classify non-sexist instances contain shortcuts. Pruning these instances leads to performances increases. Pruning hard-to-classify instances is in general a promising strategy as well when shortcuts are not present.