Mariam Labib
Also published as: Mariam Francies
2026
REGLAT at SemEval-2026 Task 12: Multi-Strategy Ensemble Reasoning for Event Causality Identification
Mariam Francies | Nsrin Ashraf | Ahmed Fetouh | Asad Khalil | Hamada Nayel
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Mariam Francies | Nsrin Ashraf | Ahmed Fetouh | Asad Khalil | Hamada Nayel
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
This paper describes the multi-strategy ensemble approach that has been used to develop the model submitted to the Abductive Event Reasoning shared task. The proposed model combines semantic similarity, causal pattern recognition, and Large Language Models (LLMs) to identify causal relationships between news events and their causes. Our system achieved competitive performance by integrating semantic embedding-based similarity, explicit causal pattern matching, keyword overlap analysis, temporal alignment scoring, and LLM-enhanced reasoning. Our system achieved accuracies of 65.4\% and 43.2\% on the development set using the LLM-enhanced configuration and the non-LLM ensemble, respectively. The final score using the test set on the leaderboard is 0.3.
REGLAT at SemEval-2026 Task 9: Enhancing Arabic Online Polarization Detection Using AraBERT and Synonym Replacement Augmentation
Ahmed Fetouh | Mariam Francies | Nsrin Ashraf | Hamada Nayel | Rahmath Mohammed
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Ahmed Fetouh | Mariam Francies | Nsrin Ashraf | Hamada Nayel | Rahmath Mohammed
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
In this paper, we present our system, which was submitted to SemEval-2026 Task 9 (Subtask 1: Polarization Detection) and focuses on binary classification of polarized content in Arabic social media text. To address Arabic linguistic variations, we propose a single-model approach that combines fine-tuned AraBERT with synonym-based data augmentation. On the Arabic bind set, our method achieves a competitive macro F1-score of 0.831 and an accuracy of 0.833. Among the 45 participating teams, our system ranked 11th overall, with a performance gap of 0.018 macro F1 from the top-ranked team (0.8488). The results show that a fine-tuned AraBERT with synonym replacement is a strong, simple, and reproducible baseline that outperforms more complex setups in dealing with Arabic attitude polarization nuances.
REGLAT at AbjadGenEval: Multi-Model Ensemble Approach for Arabic AI-Generated Text Detection
Mariam Labib | Nsrin Ashraf | Ahmed M. Fetouh | Hamada Nayel
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script
Mariam Labib | Nsrin Ashraf | Ahmed M. Fetouh | Hamada Nayel
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script
The rapid advancement of large language models necessitates robust methods for detecting AI-generated Arabic text. This paper presents our system for distinguishing human-written from machine-generated Arabic content. We propose a weighted ensemble combining AraBERTv2 and BERT-base-arabic, trained via 5-fold stratified cross-validation with class-balanced loss functions. Our methodology incorporates Arabic text normalization, strategic data augmentation using 16,678 samples from external scientific abstracts, and threshold optimization prioritizing recall. On the official test set, our system achieved an F1-score of 0.763, an accuracy of 0.695, a precision of 0.624, and a recall of 0.980, demonstrating strong detection of machine-generated texts with minimal false negatives at the cost of elevated false positives. Analysis reveals critical insights into precision-recall trade-offs and challenges in cross-domain generalization for Arabic AI text detection.
REGLAT at AbjadMed: Handling Imbalanced Arabic Medical Text Classification via Hierarchical KNN-MLP Architecture
Ahmed M. Fetouh | Mohammed Rahmath | Omer Dawood | Mariam Labib | Nsrin Ashraf | Hamada Nayel
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script
Ahmed M. Fetouh | Mohammed Rahmath | Omer Dawood | Mariam Labib | Nsrin Ashraf | Hamada Nayel
Proceedings of the 2nd Workshop on NLP for Languages Using Arabic Script
In this paper, we demonstrate the system submitted to the shared task of medical text classification in Arabic. We proposed a single-model approach based on fine-tuned LLM-based embedding combined with hierarchical classical classifiers, achieving a competitive macro F1-score of 0.46 on the blind test set. We explored various modeling strategies, including tree-based ensembles, LLM, and hierarchical correction for rare classes, highlighting the effectiveness of domain-specific fine-tuning in low-resource settings. The results demonstrate that a single fine-tuned Arabic BERT variant can serve as a strong baseline in extreme imbalance scenarios, outperforming more complex ensembles in simplicity and reproducibility.
2025
Inside the Box: A Streamlined Model for AI-Generated News Article Detection
Nsrin Ashraf | Mariam Labib | Hamada Nayel
Proceedings of the Shared Task on Multi-Domain Detection of AI-Generated Text
Nsrin Ashraf | Mariam Labib | Hamada Nayel
Proceedings of the Shared Task on Multi-Domain Detection of AI-Generated Text
The rapid proliferation of AI-generated text has raised concerns. With the increasing prevalence of AI-generated content, concerns have grown regarding authenticity, authorship, and the spread of misinformation. Detecting such content accurately and efficiently has become a pressing challenge. In this study, we propose a simple yet effective system for classifying AI-generated versus human-written text. Rather than relying on complex or resource-intensive deep learning architectures, our approach leverages classical machine learning algorithms combined with the TF-IDF text representation technique. Evaluated on the M-DAIGT shared task dataset, our Support Vector Machine (SVM) based system achieved strong results, ranking second on the official leaderboard and demonstrating competitive performance across all evaluation metrics. These findings highlight the potential of traditional lightweight models to address modern challenges in text authenticity detection, particularly in low-resource or real-time applications where interpretability and efficiency are essential.