Francesco Pierri
2025
Conspiracy Theories and Where to Find Them on TikTok
Francesco Corso
|
Francesco Pierri
|
Gianmarco De Francisci Morales
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
TikTok has skyrocketed in popularity over recent years, especially among younger audiences. However, there are public concerns about the potential of this platform to promote and amplify harmful content. This study presents the first systematic analysis of conspiracy theories on TikTok. By leveraging the official TikTok Research API we collect a longitudinal dataset of 1.5M videos shared in the U.S. over three years. We estimate a lower bound on the prevalence of conspiratorial videos (up to 1000 new videos per month) and evaluate the effects of TikTok’s Creativity Program for monetization, observing an overall increase in video duration regardless of content. Lastly, we evaluate the capabilities of state-of-the-art open-weight Large Language Models to identify conspiracy theories from audio transcriptions of videos. While these models achieve high precision in detecting harmful content (up to 96%), their overall performance remains comparable to fine-tuned traditional models such as RoBERTa. Our findings suggest that Large Language Models can serve as an effective tool for supporting content moderation strategies aimed at reducing the spread of harmful content on TikTok.
Can I Introduce My Boyfriend to My Grandmother? Evaluating Large Language Models Capabilities on Iranian Social Norm Classification
Hamidreza Saffari
|
Mohammadamin Shafiei
|
Donya Rooein
|
Francesco Pierri
|
Debora Nozza
Findings of the Association for Computational Linguistics: NAACL 2025
Creating globally inclusive AI systems demands datasets reflecting diverse social norms. Iran, with its unique cultural blend, offers an ideal case study, with Farsi adding linguistic complexity. In this work, we introduce the Iranian Social Norms (ISN) dataset, a novel collection of 1,699 Iranian social norms, including environments, demographic features, and scope annotation, alongside English translations. Our evaluation of 6 Large Language Models (LLMs) in classifying Iranian social norms, using a variety of prompts, uncovered critical insights into the impact of geographic and linguistic context. Results revealed a substantial performance gap in LLMs’ comprehension of Iranian norms. Notably, while the geographic context in English prompts enhanced the performance, this effect was absent in Farsi, pointing to nuanced linguistic challenges. Particularly, performance was significantly worse for Iran-specific norms, emphasizing the importance of culturally tailored datasets. As the first Farsi dataset for social norm classification, ISN will facilitate crucial cross-cultural analyses, shedding light on how values differ across contexts and cultures.