Jasmin Heierli


2025

We evaluate four representative large language models, namely GPT-4o, Gemini, Llama, and DeepSeek on on a suite of linguistic and cultural tasks in Persian, covering grammar, paraphrasing, inference, translation, factual recall, analogical reasoning, and a Hofstede-based cultural probe under direct and role-based prompts. Our findings reveal consistent performance declines, alongside systematic misalignment with Iranian cultural norms. Role-based prompting yields modest improvements but does not fully restore cultural fidelity. We conclude that advancing truly multilingual models demands richer Persian resources, targeted adaptation, and evaluation frameworks that jointly assess fluency and cultural alignment.

2024

This paper details our participation in the FIGNEWS-2024 shared task on bias and propaganda annotation in Gaza conflict news. Our objectives were to develop robust guidelines and annotate a substantial dataset to enhance bias detection. We iteratively refined our guidelines and used examples for clarity. Key findings include the challenges in achieving high inter-annotator agreement and the importance of annotator awareness of their own biases. We also explored the integration of ChatGPT as an annotator to support consistency. This paper contributes to the field by providing detailed annotation guidelines, and offering insights into the subjectivity of bias annotation.