Can LLMs Rank the Harmfulness of Smaller LLMs? We are Not There Yet
Berk Atil, Vipul Gupta, Sarkar Snigdha Sarathi Das, Rebecca Passonneau
Abstract
Large language models (LLMs) have become ubiquitous, thus it is important to understand their risks and limitations, such as their propensity to generate harmful output. This includes smaller LLMs, which are important for settings with constrained compute resources, such as edge devices. Detection of LLM harm typically requires human annotation, which is expensive to collect. This work studies two questions: How do smaller LLMs rank regarding generation of harmful content? How well can larger LLMs annotate harmfulness? We prompt three small LLMs to elicit harmful content of various types, such as discriminatory language, offensive content, privacy invasion, or negative influence, and collect human rankings of their outputs. Then, we compare harm annotation from three state-of-the-art large LLMs with each other and with humans. We find that the smaller models differ with respect to harmfulness. We also find that large LLMs show low to moderate agreement with humans.- Anthology ID:
- 2025.woah-1.30
- Volume:
- Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH)
- Month:
- August
- Year:
- 2025
- Address:
- Vienna, Austria
- Editors:
- Agostina Calabrese, Christine de Kock, Debora Nozza, Flor Miriam Plaza-del-Arco, Zeerak Talat, Francielle Vargas
- Venues:
- WOAH | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 342–354
- Language:
- URL:
- https://preview.aclanthology.org/landing_page/2025.woah-1.30/
- DOI:
- Cite (ACL):
- Berk Atil, Vipul Gupta, Sarkar Snigdha Sarathi Das, and Rebecca Passonneau. 2025. Can LLMs Rank the Harmfulness of Smaller LLMs? We are Not There Yet. In Proceedings of the The 9th Workshop on Online Abuse and Harms (WOAH), pages 342–354, Vienna, Austria. Association for Computational Linguistics.
- Cite (Informal):
- Can LLMs Rank the Harmfulness of Smaller LLMs? We are Not There Yet (Atil et al., WOAH 2025)
- PDF:
- https://preview.aclanthology.org/landing_page/2025.woah-1.30.pdf