Fatima Zahra Qachfar


2024

pdf
DetectiveReDASers at HSD-2Lang 2024: A New Pooling Strategy with Cross-lingual Augmentation and Ensembling for Hate Speech Detection in Low-resource Languages
Fatima Zahra Qachfar | Bryan Tuck | Rakesh Verma
Proceedings of the 7th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2024)

This paper addresses hate speech detection in Turkish and Arabic tweets, contributing to the HSD-2Lang Shared Task. We propose a specialized pooling strategy within a soft-voting ensemble framework to improve classification in Turkish and Arabic language models. Our approach also includes expanding the training sets through cross-lingual translation, introducing a broader spectrum of hate speech examples. Our method attains F1-Macro scores of 0.6964 for Turkish (Subtask A) and 0.7123 for Arabic (Subtask B). While achieving these results, we also consider the computational overhead, striking a balance between the effectiveness of our unique pooling strategy, data augmentation, and soft-voting ensemble. This approach advances the practical application of language models in low-resource languages for hate speech detection.

pdf
Domain-Agnostic Adapter Architecture for Deception Detection: Extensive Evaluations with the DIFrauD Benchmark
Dainis A. Boumber | Fatima Zahra Qachfar | Rakesh Verma
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Despite significant strides in training expansive transformer models, their deployment for niche tasks remains intricate. This paper delves into deception detection, assessing domain adaptation methodologies from a cross-domain lens using transformer Large Language Models (LLMs). We roll out a new corpus with roughly 100,000 honest and misleading statements in seven domains, designed to serve as a benchmark for multidomain deception detection. As a primary contribution, we present a novel parameter-efficient finetuning adapter, PreXIA, which was proposed and implemented as part of this work. The design is model-, domain- and task-agnostic, with broad applications that are not limited by the confines of deception or classification tasks. We comprehensively analyze and rigorously evaluate LLM tuning methods and our original design using the new benchmark, highlighting their strengths, pointing out weaknesses, and suggesting potential areas for improvement. The proposed adapter consistently outperforms all competition on the DIFrauD benchmark used in this study. To the best of our knowledge, it improves on the state-of-the-art in its class for the deception task. In addition, the evaluation process leads to unexpected findings that, at the very least, cast doubt on the conclusions made in some of the recently published research regarding reasoning ability’s unequivocal dominance over representations quality with respect to the relative contribution of each one to a model’s performance and predictions.

2023

pdf
DetectiveRedasers at ArAIEval Shared Task: Leveraging Transformer Ensembles for Arabic Deception Detection
Bryan Tuck | Fatima Zahra Qachfar | Dainis Boumber | Rakesh Verma
Proceedings of ArabicNLP 2023

This paper outlines a methodology aimed at combating disinformation in Arabic social media, a strategy that secured a first-place finish in tasks 2A and 2B at the ArAIEval shared task during the ArabicNLP 2023 conference. Our team, DetectiveRedasers, developed a hyperparameter-optimized pipeline centered around singular BERT-based models for the Arabic language, enhanced by a soft-voting ensemble strategy. Subsequent evaluation on the test dataset reveals that ensembles, although generally resilient, do not always outperform individual models. The primary contributions of this paper are its multifaceted strategy, which led to winning solutions for both binary (2A) and multiclass (2B) disinformation classification tasks.

pdf
ReDASPersuasion at ArAIEval Shared Task: Multilingual and Monolingual Models For Arabic Persuasion Detection
Fatima Zahra Qachfar | Rakesh Verma
Proceedings of ArabicNLP 2023

To enhance persuasion detection, we investigate the use of multilingual systems on Arabic data by conducting a total of 22 experiments using baselines, multilingual, and monolingual language transformers. Our aim is to provide a comprehensive evaluation of the various systems employed throughout this task, with the ultimate goal of comparing their performance and identifying the most effective approach. Our empirical analysis shows that *ReDASPersuasion* system performs best when combined with multilingual “XLM-RoBERTa” and monolingual pre-trained transformers on Arabic dialects like “CAMeLBERT-DA SA” depending on the NLP classification task.

pdf
ReDASPersuasion at SemEval-2023 Task 3: Persuasion Detection using Multilingual Transformers and Language Agnostic Features
Fatima Zahra Qachfar | Rakesh Verma
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper describes a multilingual persuasion detection system that incorporates persuasion technique attributes for a multi-label classification task. The proposed method has two advantages. First, it combines persuasion features with a sequence classification transformer to classify persuasion techniques. Second, it is a language agnostic approach that supports a total of 100 languages, guaranteed by the multilingual transformer module and the Google translator interface. We found that our persuasion system outperformed the SemEval baseline in all languages except zero shot prediction languages, which did not constitute the main focus of our research. With the highest F1-Micro score of 0.45, Italian achieved the eighth position on the leaderboard.