Ahsan Adeel


2025

pdf bib
Single layer tiny Co4 outpaces GPT-2 and GPT-BERT
Noor Ul Zain | Mohsin Raza Naseem | Ahsan Adeel
Proceedings of the First BabyLM Workshop

We show that a tiny Co4 machine (CITATION) with a single layer, two heads, and 8M parameters, operating at O(N) computational cost (where N is the number of input tokens), in just 2 epochs outpaces GPT-2 (124M, 12 layers, O(N2)) and GPT-BERT (30M, 12 layers, O(N2)), both trained for 10 epochs. Co4 achieves orders-of-magnitude greater training efficiency on 10M tokens, demonstrating sample-efficient pretraining. On the BabyLM challenge evaluation pipeline, Co4 performs comparably or better across complex benchmarks, showing strong zero-shot and fine-tuning performance on SuperGLUE tasks. Specifically, Co4 outperforms GPT-2 in 5 out of 7 zero-shot metrics and 6 out of 7 fine-tuning tasks, and GPT-BERT in 4 out of 7 metrics in both cases. These results strongly suggest a need to rethink prevailing deep learning paradigms and associated scaling laws.

2022

pdf bib
Ara-Women-Hate: An Annotated Corpus Dedicated to Hate Speech Detection against Women in the Arabic Community
Imane Guellil | Ahsan Adeel | Faical Azouaou | Mohamed Boubred | Yousra Houichi | Akram Abdelhaq Moumna
Proceedings of the Workshop on Dataset Creation for Lower-Resourced Languages within the 13th Language Resources and Evaluation Conference

In this paper, an approach for hate speech detection against women in the Arabic community on social media (e.g. Youtube) is proposed. In the literature, similar works have been presented for other languages such as English. However, to the best of our knowledge, not much work has been conducted in the Arabic language. A new hate speech corpus (Arabic_fr_en) is developed using three different annotators. For corpus validation, three different machine learning algorithms are used, including deep Convolutional Neural Network (CNN), long short-term memory (LSTM) network and Bi-directional LSTM (Bi-LSTM) network. Simulation results demonstrate the best performa

2018

pdf bib
Arabizi sentiment analysis based on transliteration and automatic corpus annotation
Imane Guellil | Ahsan Adeel | Faical Azouaou | Fodil Benali | Ala-eddine Hachani | Amir Hussain
Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

Arabizi is a form of writing Arabic text which relies on Latin letters, numerals and punctuation rather than Arabic letters. In the literature, the difficulties associated with Arabizi sentiment analysis have been underestimated, principally due to the complexity of Arabizi. In this paper, we present an approach to automatically classify sentiments of Arabizi messages into positives or negatives. In the proposed approach, Arabizi messages are first transliterated into Arabic. Afterwards, we automatically classify the sentiment of the transliterated corpus using an automatically annotated corpus. For corpus validation, shallow machine learning algorithms such as Support Vectors Machine (SVM) and Naive Bays (NB) are used. Simulations results demonstrate the outperformance of NB algorithm over all others. The highest achieved F1-score is up to 78% and 76% for manually and automatically transliterated dataset respectively. Ongoing work is aimed at improving the transliterator module and annotated sentiment dataset.