Hadda Cherroun


2024

pdf
MODOS at ArAIEval Shared Task: Multimodal Propagandistic Memes Classification Using Weighted SAM, CLIP and ArabianGPT
Abdelhamid Haouhat | Hadda Cherroun | Slimane Bellaouar | Attia Nehar
Proceedings of The Second Arabic Natural Language Processing Conference

Arabic social media platforms are increasingly using propaganda to deceive or influence people. This propaganda is often spread through multimodal content, such as memes. While substantial research has addressed the automatic detection of propaganda in English content, this paper presents the MODOS team’s participation in the Arabic Multimodal Propagandistic Memes Classification shared task. Our system deploys the Segment Anything Model (SAM) and CLIP for image representation and ARABIAN-GPT embeddings for text. Then, we employ LSTM encoders followed by a weighted fusion strategy to perform binary classification. Our system achieved competitive performance in distinguishing between propagandistic and non-propagandistic memes, scored 0.7290 macro F1, and ranked 6th among the participants.

2023

pdf
AraBERT and mBert: Insights from Psycholinguistic Diagnostics
Basma Sayah | Attia Nehar | Hadda Cherroun | Slimane Bellaouar
Proceedings of the 6th International Conference on Natural Language and Speech Processing (ICNLSP 2023)

2021

pdf
User Generated Content and Engagement Analysis in Social Media case of Algerian Brands
Aicha Chorana | Hadda Cherroun
Proceedings of the 4th International Conference on Natural Language and Speech Processing (ICNLSP 2021)

2019

pdf
A Crowdsourcing-based Approach for Speech Corpus Transcription Case of Arabic Algerian Dialects
Ilyes Zine | Mohamed Cherif Zeghad | Soumia Bougrine | Hadda Cherroun
Proceedings of the 3rd International Conference on Natural Language and Speech Processing

2017

pdf
Toward a Web-based Speech Corpus for Algerian Dialectal Arabic Varieties
Soumia Bougrine | Aicha Chorana | Abdallah Lakhdari | Hadda Cherroun
Proceedings of the Third Arabic Natural Language Processing Workshop

The success of machine learning for automatic speech processing has raised the need for large scale datasets. However, collecting such data is often a challenging task as it implies significant investment involving time and money cost. In this paper, we devise a recipe for building largescale Speech Corpora by harnessing Web resources namely YouTube, other Social Media, Online Radio and TV. We illustrate our methodology by building KALAM’DZ, An Arabic Spoken corpus dedicated to Algerian dialectal varieties. The preliminary version of our dataset covers all major Algerian dialects. In addition, we make sure that this material takes into account numerous aspects that foster its richness. In fact, we have targeted various speech topics. Some automatic and manual annotations are provided. They gather useful information related to the speakers and sub-dialect information at the utterance level. Our corpus encompasses the 8 major Algerian Arabic sub-dialects with 4881 speakers and more than 104.4 hours segmented in utterances of at least 6 s.