Abstract
Social media platforms have become essential in daily life, enabling users to express their opinions and stances on various topics. Stance detection, which identifies the viewpoint expressed in text toward a target, has predominantly focused on English. MAWQIF is the pioneering Arabic dataset for target-specific stance detection, consisting of 4,121 tweets annotated with stance, sentiment, and sarcasm. The original dataset, benchmarked on four BERT-based models, achieved a best macro-F1 score of 78.89, indicating significant room for improvement. This study evaluates the effectiveness of three Large Language Models (LLMs) in detecting target-specific stances in MAWQIF. The LLMs assessed are ChatGPT-3.5-turbo, Meta-Llama-3-8B-Instruct, and Falcon-7B-Instruct. Performance was measured using both zero-shot and full fine-tuning approaches. Our findings demonstrate that fine-tuning substantially enhances the stance detection capabilities of LLMs in Arabic tweets. Notably, GPT-3.5-Turbo achieved the highest performance with a macro-F1 score of 82.93, underscoring the potential of fine-tuned LLMs for language-specific applications.