Few-Shot Prompting, Full-Scale Confusion: Evaluating Large Language Models for Humor Detection in Croatian Tweets

Petra Bago, Nikola Bakarić


Abstract
Humor detection in low-resource languages is hampered by cultural nuance and subjective annotation. We test two large language models, GPT-4 and Gemini 2.5 Flash, on labeling humor in 6,000 Croatian tweets with expert gold labels generated through a rigorous annotation pipeline. LLM–human agreement (κ = 0.28) matches human–human agreement (κ = 0.27), while LLM–LLM agreement is substantially higher (κ = 0.63). Although concordance with expert adjudication is lower, additional metrics imply that the models equal a second human annotator while working far faster and at negligible cost. These findings suggest, even with simple prompting, LLMs can efficiently bootstrap subjective datasets and serve as practical annotation assistants in linguistically under-represented settings.
Anthology ID:
2025.bsnlp-1.2
Volume:
Proceedings of the 10th Workshop on Slavic Natural Language Processing (Slavic NLP 2025)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Jakub Piskorski, Pavel Přibáň, Preslav Nakov, Roman Yangarber, Michal Marcinczuk
Venues:
BSNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9–16
Language:
URL:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bsnlp-1.2/
DOI:
Bibkey:
Cite (ACL):
Petra Bago and Nikola Bakarić. 2025. Few-Shot Prompting, Full-Scale Confusion: Evaluating Large Language Models for Humor Detection in Croatian Tweets. In Proceedings of the 10th Workshop on Slavic Natural Language Processing (Slavic NLP 2025), pages 9–16, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Few-Shot Prompting, Full-Scale Confusion: Evaluating Large Language Models for Humor Detection in Croatian Tweets (Bago & Bakarić, BSNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.bsnlp-1.2.pdf