Evaluating Large Language Models for Detecting Antisemitism

Jay Patel; Hrudayangam Mehta; Jeremy Blackburn

Evaluating Large Language Models for Detecting Antisemitism

Jay Patel, Hrudayangam Mehta, Jeremy Blackburn

Abstract

Detecting hateful content is a challenging and important problem. Automated tools, like machine‐learning models, can help, but they require continuous training to adapt to the ever-changing landscape of social media. In this work, we evaluate eight open-source LLMs’ capability to detect antisemitic content, specifically leveraging in-context definition as a policy guideline. We explore various prompting techniques and design a new CoT-like prompt, Guided-CoT. Guided‐CoT handles the in-context policy well, increasing performance across all evaluated models, regardless of decoding configuration, model sizes, or reasoning capability. Notably, Llama 3.1 70B outperforms fine-tuned GPT-3.5. Additionally, we examine LLM errors and introduce metrics to quantify semantic divergence in model-generated rationales, revealing notable differences and paradoxical behaviors among LLMs. Our experiments highlight the differences observed across LLMs’ utility, explainability, and reliability.

Anthology ID:: 2025.emnlp-main.1792
Volume:: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 35356–35385
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1792/
DOI:
Bibkey:
Cite (ACL):: Jay Patel, Hrudayangam Mehta, and Jeremy Blackburn. 2025. Evaluating Large Language Models for Detecting Antisemitism. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 35356–35385, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Evaluating Large Language Models for Detecting Antisemitism (Patel et al., EMNLP 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.emnlp-main.1792.pdf
Checklist:: 2025.emnlp-main.1792.checklist.pdf

PDF Cite Search Checklist Fix data