Learning from Disagreement: Entropy-Guided Few-Shot Selection for Toxic Language Detection

Tommaso Caselli; Flor Miriam Plaza-Del-Arco

Learning from Disagreement: Entropy-Guided Few-Shot Selection for Toxic Language Detection

Tommaso Caselli, Flor Miriam Plaza-del-Arco

Abstract

In-context learning (ICL) has shown significant benefits, particularly in scenarios where large amounts of labeled data are unavailable. However, its effectiveness for highly subjective tasks, such as toxic language detection, remains an open question. A key challenge in this setting is to select shots to maximize performance. Although previous work has focused on enhancing variety and representativeness, the role of annotator disagreement in shot selection has received less attention. In this paper, we conduct an in-depth analysis of ICL using two families of open-source LLMs (Llama-3* and Qwen2.5) of varying sizes, evaluating their performance in five prominent English datasets covering multiple toxic language phenomena. We use disaggregated annotations and categorize different types of training examples to assess their impact on model predictions. We specifically investigate whether selecting shots based on annotators’ entropy – focusing on ambiguous or difficult examples – can improve generalization in LLMs. Additionally, we examine the extent to which the order of examples in prompts influences model behavior.Our results show that selecting shots based on entropy from annotator disagreement can enhance ICL performance. Specifically, ambiguous shots with a median entropy value generally lead to the best results for our selected LLMs in the few-shot setting. However, ICL often underperforms when compared to fine-tuned encoders.

Anthology ID:: 2025.woah-1.5
Volume:: Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH)
Month:: August
Year:: 2025
Address:: Vienna, Austria
Editors:: Agostina Calabrese, Christine de Kock, Debora Nozza, Flor Miriam Plaza-del-Arco, Zeerak Talat, Francielle Vargas
Venues:: WOAH | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 53–66
Language:
URL:: https://preview.aclanthology.org/landing_page/2025.woah-1.5/
DOI:
Bibkey:
Cite (ACL):: Tommaso Caselli and Flor Miriam Plaza-del-Arco. 2025. Learning from Disagreement: Entropy-Guided Few-Shot Selection for Toxic Language Detection. In Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH), pages 53–66, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Learning from Disagreement: Entropy-Guided Few-Shot Selection for Toxic Language Detection (Caselli & Plaza-del-Arco, WOAH 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2025.woah-1.5.pdf

PDF Cite Search Fix data