Dominik Sipek
2025
What Makes You CLIC: Detection of Croatian Clickbait Headliness
Marija Andelic
|
Dominik Sipek
|
Laura Majer
|
Jan Snajder
Proceedings of the 10th Workshop on Slavic Natural Language Processing (Slavic NLP 2025)
Online news outlets operate predominantly on an advertising-based revenue model, compelling journalists to create headlines that are often scandalous, intriguing, and provocative – commonly referred to as clickbait. Automatic detection of clickbait headlines is essential for preserving information quality and reader trust in digital media and requires both contextual understanding and world knowledge. For this task, particularly in less-resourced languages, it remains unclear whether fine-tuned methods or in-context learning (ICL) yield better results. In this paper, we compile clic, a novel dataset for clickbait detection of Croatian news headlines spanning a 20-year period and encompassing mainstream and fringe outlets. Furthermore, we fine-tune the BERTić model on the task of clickbait detection for Croatian and compare its performance to LLM-based ICL methods with prompts both in Croatian and English. Finally, we analyze the linguistic properties of clickbait. We find that nearly half of the analyzed headlines contain clickbait, and that finetuned models deliver better results than general LLMs.