Nouf Alshalawi


2024

pdf
muNERa at WojoodNER 2024: Multi-tasking NER Approach
Nouf Alotaibi | Haneen Alhomoud | Hanan Murayshid | Waad Alshammari | Nouf Alshalawi | Sakhar Alkhereyf
Proceedings of The Second Arabic Natural Language Processing Conference

This paper presents our system “muNERa”, submitted to the WojoodNER 2024 shared task at the second ArabicNLP conference. We participated in two subtasks, the flat and nested fine-grained NER sub-tasks (1 and 2). muNERa achieved first place in the nested NER sub-task and second place in the flat NER sub-task. The system is based on the TANL framework (CITATION),by using a sequence-to-sequence structured language translation approach to model both tasks. We utilize the pre-trained AraT5v2-base model as the base model for the TANL framework. The best-performing muNERa model achieves 91.07% and 90.26% for the F-1 scores on the test sets for the nested and flat subtasks, respectively.

2023

pdf
Evaluating ChatGPT and Bard AI on Arabic Sentiment Analysis
Abdulmohsen Al-Thubaity | Sakhar Alkhereyf | Hanan Murayshid | Nouf Alshalawi | Maha Omirah | Raghad Alateeq | Rawabi Almutairi | Razan Alsuwailem | Manal Alhassoun | Imaan Alkhanen
Proceedings of ArabicNLP 2023

Large Language Models (LLMs) such as ChatGPT and Bard AI have gained much attention due to their outstanding performance on a range of NLP tasks. These models have demonstrated remarkable proficiency across various languages without the necessity for full supervision. Nevertheless, their performance in low-resource languages and dialects, like Arabic dialects in comparison to English, remains to be investigated. In this paper, we conduct a comprehensive evaluation of three LLMs for Dialectal Arabic Sentiment Analysis: namely, ChatGPT based on GPT-3.5 and GPT-4, and Bard AI. We use a Saudi dialect Twitter dataset to assess their capability in sentiment text classification and generation. For classification, we compare the performance of fully fine-tuned Arabic BERT-based models with the LLMs in few-shot settings. For data generation, we evaluate the quality of the generated new sentiment samples using human and automatic evaluation methods. The experiments reveal that GPT-4 outperforms GPT-3.5 and Bard AI in sentiment analysis classification, rivaling the top-performing fully supervised BERT-based language model. However, in terms of data generation, compared to manually annotated authentic data, these generative models often fall short in producing high-quality Dialectal Arabic text suitable for sentiment analysis.