dzNLP at NADI 2024 Shared Task: Multi-Classifier Ensemble with Weighted Voting and TF-IDF Features

Mohamed Lichouri, Khaled Lounnas, Zahaf Nadjib, Rabiai Ayoub


Abstract
This paper presents the contribution of our dzNLP team to the NADI 2024 shared task, specifically in Subtask 1 - Multi-label Country-level Dialect Identification (MLDID) (Closed Track). We explored various configurations to address the challenge: in Experiment 1, we utilized a union of n-gram analyzers (word, character, character with word boundaries) with different n-gram values; in Experiment 2, we combined a weighted union of Term Frequency-Inverse Document Frequency (TF-IDF) features with various weights; and in Experiment 3, we implemented a weighted major voting scheme using three classifiers: Linear Support Vector Classifier (LSVC), Random Forest (RF), and K-Nearest Neighbors (KNN).Our approach, despite its simplicity and reliance on traditional machine learning techniques, demonstrated competitive performance in terms of accuracy and precision. Notably, we achieved the highest precision score of 63.22% among the participating teams. However, our overall F1 score was approximately 21%, significantly impacted by a low recall rate of 12.87%. This indicates that while our models were highly precise, they struggled to recall a broad range of dialect labels, highlighting a critical area for improvement in handling diverse dialectal variations.
Anthology ID:
2024.arabicnlp-1.84
Volume:
Proceedings of The Second Arabic Natural Language Processing Conference
Month:
August
Year:
2024
Address:
Bangkok, Thailand
Editors:
Nizar Habash, Houda Bouamor, Ramy Eskander, Nadi Tomeh, Ibrahim Abu Farha, Ahmed Abdelali, Samia Touileb, Injy Hamed, Yaser Onaizan, Bashar Alhafni, Wissam Antoun, Salam Khalifa, Hatem Haddad, Imed Zitouni, Badr AlKhamissi, Rawan Almatham, Khalil Mrini
Venues:
ArabicNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
754–757
Language:
URL:
https://aclanthology.org/2024.arabicnlp-1.84
DOI:
Bibkey:
Cite (ACL):
Mohamed Lichouri, Khaled Lounnas, Zahaf Nadjib, and Rabiai Ayoub. 2024. dzNLP at NADI 2024 Shared Task: Multi-Classifier Ensemble with Weighted Voting and TF-IDF Features. In Proceedings of The Second Arabic Natural Language Processing Conference, pages 754–757, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):
dzNLP at NADI 2024 Shared Task: Multi-Classifier Ensemble with Weighted Voting and TF-IDF Features (Lichouri et al., ArabicNLP-WS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2024.arabicnlp-1.84.pdf