2023
pdf
abs
HausaNLP at SemEval-2023 Task 12: Leveraging African Low Resource TweetData for Sentiment Analysis
Saheed Abdullahi Salahudeen
|
Falalu Ibrahim Lawan
|
Ahmad Wali
|
Amina Abubakar Imam
|
Aliyu Rabiu Shuaibu
|
Aliyu Yusuf
|
Nur Bala Rabiu
|
Musa Bello
|
Shamsuddeen Umaru Adamu
|
Saminu Mohammad Aliyu
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
We present the findings of SemEval-2023 Task 12, a shared task on sentiment analysis for low-resource African languages using Twitter dataset. The task featured three subtasks; subtask A is monolingual sentiment classification with 12 tracks which are all monolingual languages, subtask B is multilingual sentiment classification using the tracks in subtask A and subtask C is a zero-shot sentiment classification. We present the results and findings of subtask A, subtask B and subtask C. We also release the code on github. Our goal is to leverage low-resource tweet data using pre-trained Afro-xlmr-large, AfriBERTa-Large, Bert-base-arabic-camelbert-da-sentiment (Arabic-camelbert), Multilingual-BERT (mBERT) and BERT models for sentiment analysis of 14 African languages. The datasets for these subtasks consists of a gold standard multi-class labeled Twitter datasets from these languages. Our results demonstrate that Afro-xlmr-large model performed better compared to the other models in most of the languages datasets. Similarly, Nigerian languages: Hausa, Igbo, and Yoruba achieved better performance compared to other languages and this can be attributed to the higher volume of data present in the languages.
pdf
abs
HausaNLP at SemEval-2023 Task 10: Transfer Learning, Synthetic Data and Side-information for Multi-level Sexism Classification
Saminu Mohammad Aliyu
|
Idris Abdulmumin
|
Shamsuddeen Hassan Muhammad
|
Ibrahim Said Ahmad
|
Saheed Abdullahi Salahudeen
|
Aliyu Yusuf
|
Falalu Ibrahim Lawan
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
We present the findings of our participation in the SemEval-2023 Task 10: Explainable Detection of Online Sexism (EDOS) task, a shared task on offensive language (sexism) detection on English Gab and Reddit dataset. We investigated the effects of transferring two language models: XLM-T (sentiment classification) and HateBERT (same domain - Reddit) for multilevel classification into Sexist or not Sexist, and other subsequent sub-classifications of the sexist data. We also use synthetic classification of unlabelled dataset and intermediary class information to maximize the performance of our models. We submitted a system in Task A, and it ranked 49th with F1-score of 0.82. This result showed to be competitive as it only under-performed the best system by 0.052%F1-score.