Enhancing Training Data Quality through Influence Scores for Generalizable Classification: A Case Study on Sexism Detection

Rabiraj Bandyopadhyay, Dennis Assenmacher, Jose Maria Alonso-Moral, Claudia Wagner


Abstract
The quality of training data is crucial for the performance of supervised machine learning models. In particular, poor annotation quality and spurious correlations between labels and features in text dataset can significantly degrade model generalization. This problem is especially pronounced in harmful language detection, where prior studies have revealed major deficiencies in existing datasets. In this work, we design and test data selection methods based on learnability measures to improve dataset quality. Using a sexism dataset with counterfactuals designed to avoid spurious correlations, we show that pruning with EL2N and PVI scores can lead to significant performance increases and outperforms submodular and random selection. Our analysis reveals that in presence of label imbalance models rely on dataset shortcuts; especially easy-to-classify sexist instances and hard-to-classify non-sexist instances contain shortcuts. Pruning these instances leads to performances increases. Pruning hard-to-classify instances is in general a promising strategy as well when shortcuts are not present.
Anthology ID:
2025.ijcnlp-long.77
Volume:
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Month:
December
Year:
2025
Address:
Mumbai, India
Editors:
Kentaro Inui, Sakriani Sakti, Haofen Wang, Derek F. Wong, Pushpak Bhattacharyya, Biplab Banerjee, Asif Ekbal, Tanmoy Chakraborty, Dhirendra Pratap Singh
Venues:
IJCNLP | AACL
SIG:
Publisher:
The Asian Federation of Natural Language Processing and The Association for Computational Linguistics
Note:
Pages:
1382–1403
Language:
URL:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.ijcnlp-long.77/
DOI:
Bibkey:
Cite (ACL):
Rabiraj Bandyopadhyay, Dennis Assenmacher, Jose Maria Alonso-Moral, and Claudia Wagner. 2025. Enhancing Training Data Quality through Influence Scores for Generalizable Classification: A Case Study on Sexism Detection. In Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics, pages 1382–1403, Mumbai, India. The Asian Federation of Natural Language Processing and The Association for Computational Linguistics.
Cite (Informal):
Enhancing Training Data Quality through Influence Scores for Generalizable Classification: A Case Study on Sexism Detection (Bandyopadhyay et al., IJCNLP-AACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.ijcnlp-long.77.pdf