Amr Keleg
2026
Cultural Benchmarking of LLMs in Standard and Dialectal Arabic Dialogues
Muhammad Dehan Al Kautsar | Saeed Almheiri | Momina Ahsan | Bilal Elbouardi | Younes Samih | Sarfraz Ahmad | Amr Keleg | Omar El Herraoui | Kareem Elzeky | Abed Alhakim Freihat | Mohamed Anwar | Zhuohan Xie | Junhong Liang | Mohammad Rustom Al Nasar | Preslav Nakov | Fajri Koto
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Muhammad Dehan Al Kautsar | Saeed Almheiri | Momina Ahsan | Bilal Elbouardi | Younes Samih | Sarfraz Ahmad | Amr Keleg | Omar El Herraoui | Kareem Elzeky | Abed Alhakim Freihat | Mohamed Anwar | Zhuohan Xie | Junhong Liang | Mohammad Rustom Al Nasar | Preslav Nakov | Fajri Koto
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
There is a significant gap in evaluating cultural reasoning in LLMs using conversational datasets that capture culturally rich and dialectal contexts. Most Arabic benchmarks focus on short text snippets in Modern Standard Arabic (MSA), overlooking the cultural nuances that naturally arise in dialogues. To address this gap, we introduce ArabCulture-Dialogue, a culturally grounded conversational dataset covering 13 Arabic-speaking countries, in both MSA and each country’s respective dialect, spanning 12 daily-life topics and 54 fine-grained subtopics. We utilize the dataset to form three benchmarking tasks: (i) multiple-choice cultural reasoning, (ii) machine translation between MSA and dialects, and (iii) dialect-steering generation. Our experiments indicate that the performance gap between MSA and Arabic dialects still exists, whereby the models perform worse on all three tasks in the dialectal setup, compared to the MSA one.
CommonLID: Re-evaluating State-of-the-Art Language Identification Performance on Web Data
Pedro Ortiz Suarez | Laurie Burchell | Catherine Arnett | Rafael Mosquera | Sara Hincapi\'e Monsalve | Thom Vaughan | Damian Stewart | Malte Ostendorff | Idris Abdulmumin | Vukosi Marivate | Shamsuddeen Hassan Muhammad | Atnafu Lambebo Tonja | Hend Al-Khalifa | Nadia Ghezaiel Hammouda | Verrah Akinyi Otiende | Tack Hwa Wong | Jakhongir Saydaliev | Melika Nobakhtian | Muhammad Ravi Shulthan Habibi | Chalamalasetti Kranti | Carol Muchemi | Khang Nguyen | Faisal Muhammad Adam | Luis Frentzen Salim | Reem Alqifari | Cynthia Jayne Amol | Joseph Marvin Imperial | Ilker Kesen | Ahmad Mustafid | Pavel Stepachev | Leshem Choshen | David Anugraha | Hamada Nayel | Seid Muhie Yimam | Vallerie Alexandra Putra | My Chiffon Nguyen | Azmine Toushik Wasi | Gouthami Vadithya | Rob Van Der Goot | Lanwenn ar C'horr | Karan Dua | Andrew Yates | Mithil Bangera | Yeshil Bangera | Hitesh Laxmichand Patel | Shu Okabe | Fenal Ashokbhai Ilasariya | Dmitry Gaynullin | Genta Indra Winata | Yiyuan Li | Juan Pablo Mart{\'\i}nez | Amit Agarwal | Ikhlasul Akmal Hanif | Raia Abu Ahmad | Esther Adenuga | Filbert Aurelian Tjiaranata | Weerayut Buaphet | Michael Anugraha | Sowmya Vajjala | Benjamin L Rice | Azril Hafizi Amirudin | Jesujoba Oluwadara Alabi | Srikant Panda | Yassine Toughrai | Bruhan Kyomuhendo | Daniel Ruffinelli | Akshata | Manuel Goul\~ao | Ej Zhou | Ingrid Gabriela Franco Ramirez | Cristina Aggazzotti | Konstantin Dobler | Jun Kevin | Quentin Pag\`es | Nicholas Andrews | Nuhu Ibrahim | Mattes Ruckdeschel | Amr Keleg | Mike Zhang | Casper Rufaro Muziri | Saron Samuel | Sotaro Takeshita | Kun Kerdthaisong | Luca Foppiano | Rasul Dent | Tommaso Green | Ahmad Mustapha Wali | Kamohelo Makaaka | Vicky Feliren | Inshirah Idris | Hande Celikkanat | Abdulhamid Abubakar | Jean Maillard | Beno{\^\i}t Sagot | Thibault Cl\'erice | Kenton Murray | Sarah K. K. Luger
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Pedro Ortiz Suarez | Laurie Burchell | Catherine Arnett | Rafael Mosquera | Sara Hincapi\'e Monsalve | Thom Vaughan | Damian Stewart | Malte Ostendorff | Idris Abdulmumin | Vukosi Marivate | Shamsuddeen Hassan Muhammad | Atnafu Lambebo Tonja | Hend Al-Khalifa | Nadia Ghezaiel Hammouda | Verrah Akinyi Otiende | Tack Hwa Wong | Jakhongir Saydaliev | Melika Nobakhtian | Muhammad Ravi Shulthan Habibi | Chalamalasetti Kranti | Carol Muchemi | Khang Nguyen | Faisal Muhammad Adam | Luis Frentzen Salim | Reem Alqifari | Cynthia Jayne Amol | Joseph Marvin Imperial | Ilker Kesen | Ahmad Mustafid | Pavel Stepachev | Leshem Choshen | David Anugraha | Hamada Nayel | Seid Muhie Yimam | Vallerie Alexandra Putra | My Chiffon Nguyen | Azmine Toushik Wasi | Gouthami Vadithya | Rob Van Der Goot | Lanwenn ar C'horr | Karan Dua | Andrew Yates | Mithil Bangera | Yeshil Bangera | Hitesh Laxmichand Patel | Shu Okabe | Fenal Ashokbhai Ilasariya | Dmitry Gaynullin | Genta Indra Winata | Yiyuan Li | Juan Pablo Mart{\'\i}nez | Amit Agarwal | Ikhlasul Akmal Hanif | Raia Abu Ahmad | Esther Adenuga | Filbert Aurelian Tjiaranata | Weerayut Buaphet | Michael Anugraha | Sowmya Vajjala | Benjamin L Rice | Azril Hafizi Amirudin | Jesujoba Oluwadara Alabi | Srikant Panda | Yassine Toughrai | Bruhan Kyomuhendo | Daniel Ruffinelli | Akshata | Manuel Goul\~ao | Ej Zhou | Ingrid Gabriela Franco Ramirez | Cristina Aggazzotti | Konstantin Dobler | Jun Kevin | Quentin Pag\`es | Nicholas Andrews | Nuhu Ibrahim | Mattes Ruckdeschel | Amr Keleg | Mike Zhang | Casper Rufaro Muziri | Saron Samuel | Sotaro Takeshita | Kun Kerdthaisong | Luca Foppiano | Rasul Dent | Tommaso Green | Ahmad Mustapha Wali | Kamohelo Makaaka | Vicky Feliren | Inshirah Idris | Hande Celikkanat | Abdulhamid Abubakar | Jean Maillard | Beno{\^\i}t Sagot | Thibault Cl\'erice | Kenton Murray | Sarah K. K. Luger
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Language identification (LID) is a fundamental step in curating multilingual corpora. However, LID models still perform poorly for many languages, especially on the noisy and heterogeneous web data often used to train multilingual language models. In this paper, we introduce CommonLID, a community-driven, human-annotated LID benchmark for the web domain, covering 109 languages. Many of the included languages have been previously under-served, making CommonLID a key resource for developing more representative high-quality text corpora. We show CommonLID’s value by using it, alongside five other common evaluation sets, to test eight popular LID models. We analyse our results to situate our contribution and to provide an overview of the state of the art. In particular, we highlight that existing evaluations overestimate LID accuracy for many languages in the web domain. We make CommonLID and the code used to create it available under an open, permissive license.
Curriculum Learning and Pseudo-Labeling Improve the Generalization of Multi-Label Arabic Dialect Identification Models
Ali Mekky | Mohamed El Zeftawy | Lara Hassan | Amr Keleg | Preslav Nakov
Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects
Ali Mekky | Mohamed El Zeftawy | Lara Hassan | Amr Keleg | Preslav Nakov
Proceedings of the 13th Workshop on NLP for Similar Languages, Varieties and Dialects
Being modeled as a single-label classification task for a long time, recent work has argued that Arabic Dialect Identification (ADI) should be framed as a multi-label classification task. However, ADI remains constrained by the availability of single-label datasets, with no large-scale multi-label resources available for training. By analyzing models trained on single-label ADI data, we show that the main difficulty in repurposing such datasets for Multi-Label Arabic Dialect Identification (MLADI) lies in the selection of negative samples, as many sentences treated as negative could be acceptable in multiple dialects. To address these issues, we construct a multi-label dataset by generating automatic multi-label annotations using GPT-4o and binary dialect acceptability classifiers, with aggregation guided by the Arabic Level of Dialectness (ALDi). Afterward, we train a BERT-based multi-label classifier using curriculum learning strategies aligned with dialectal complexity and label cardinality. On the MLADI leaderboard, our best-performing LahjatBERT model achieves a macro F1 of 0.69, compared to 0.55 for the strongest previously reported system.
Controlling Distributional Bias in Multi-Round LLM Generation via KL-Optimized Fine-Tuning
Yanbei Jiang | Amr Keleg | Ryandito Diandaru | Jey Han Lau | Lea Frermann | Biaoyan Fang | Fajri Koto
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yanbei Jiang | Amr Keleg | Ryandito Diandaru | Jey Han Lau | Lea Frermann | Biaoyan Fang | Fajri Koto
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
While the real world is inherently stochastic, Large Language Models (LLMs) are predominantly evaluated on single-round inference against fixed ground truths. In this work, we shift the lens to distribution alignment: assessing whether LLMs, when prompted repeatedly, can generate outputs that adhere to a desired target distribution, e.g. reflecting real-world statistics or a uniform distribution. We formulate distribution alignment using the attributes of gender, race, and sentiment within occupational contexts. Our empirical analysis reveals that off-the-shelf LLMs and standard alignment techniques, including prompt engineering and Direct Preference Optimization, fail to reliably control output distributions. To bridge this gap, we propose a novel fine-tuning framework that couples Steering Token Calibration with Semantic Alignment. We introduce a hybrid objective function combining Kullback-Leibler divergence to anchor the probability mass of latent steering tokens and Kahneman-Tversky Optimization to bind these tokens to semantically consistent responses. Experiments across six diverse datasets demonstrate that our approach significantly outperforms baselines, achieving precise distributional control in attribute generation tasks.
2025
Revisiting Common Assumptions about Arabic Dialects in NLP
Amr Keleg | Sharon Goldwater | Walid Magdy
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Amr Keleg | Sharon Goldwater | Walid Magdy
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Arabic has diverse dialects, where one dialect can be substantially different from the others. In the NLP literature, some assumptions about these dialects are widely adopted (e.g., “Arabic dialects can be grouped into distinguishable regional dialects”) and are manifested in different computational tasks such as Arabic Dialect Identification (ADI). However, these assumptions are not quantitatively verified. We identify four of these assumptions and examine them by extending and analyzing a multi-label dataset, where the validity of each sentence in 11 different country-level dialects is manually assessed by speakers of these dialects. Our analysis indicates that the four assumptions oversimplify reality, and some of them are not always accurate. This in turn might be hindering further progress in different Arabic NLP tasks.
LLM Alignment for the Arabs: A Homogenous Culture or Diverse Ones
Amr Keleg
Proceedings of the 3rd Workshop on Cross-Cultural Considerations in NLP (C3NLP 2025)
Amr Keleg
Proceedings of the 3rd Workshop on Cross-Cultural Considerations in NLP (C3NLP 2025)
Large Language Models (LLMs) have the potential of being a useful tool that can automate tasks, and assist humans. However, these models are more fluent in English and more aligned with Western cultures, norms, and values. Arabic-specific LLMs are being developed to better capture the nuances of the Arabic language, and the views of the Arabs. However, Arabs are sometimes assumed to share the same culture. In this position paper, we discuss the limitations of this assumption and provide our recommendations for how to curate better alignment data that models the cultural diversity within the Arab world.
2024
NADI 2024: The Fifth Nuanced Arabic Dialect Identification Shared Task
Muhammad Abdul-Mageed | Amr Keleg | AbdelRahim Elmadany | Chiyu Zhang | Injy Hamed | Walid Magdy | Houda Bouamor | Nizar Habash
Proceedings of the Second Arabic Natural Language Processing Conference
Muhammad Abdul-Mageed | Amr Keleg | AbdelRahim Elmadany | Chiyu Zhang | Injy Hamed | Walid Magdy | Houda Bouamor | Nizar Habash
Proceedings of the Second Arabic Natural Language Processing Conference
We describe the findings of the fifth Nuanced Arabic Dialect Identification Shared Task (NADI 2024). NADI’s objective is to help advance SoTA Arabic NLP by providing guidance, datasets, modeling opportunities, and standardized evaluation conditions that allow researchers to collaboratively compete on prespecified tasks. NADI 2024 targeted both dialect identification cast as a multi-label task (Subtask 1), identification of the Arabic level of dialectness (Subtask 2), and dialect-to-MSA machine translation (Subtask 3). A total of 51 unique teams registered for the shared task, of whom 12 teams have participated (with 76 valid submissions during the test phase). Among these, three teams participated in Subtask 1, three in Subtask 2, and eight in Subtask 3. The winning teams achieved 50.57 F1 on Subtask 1, 0.1403 RMSE for Subtask 2, and 20.44 BLEU in Subtask 3, respectively. Results show that Arabic dialect processing tasks such as dialect identification and machine translation remain challenging. We describe the methods employed by the participating teams and briefly offer an outlook for NADI.
Estimating the Level of Dialectness Predicts Inter-annotator Agreement in Multi-dialect Arabic Datasets
Amr Keleg | Walid Magdy | Sharon Goldwater
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Amr Keleg | Walid Magdy | Sharon Goldwater
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
On annotating multi-dialect Arabic datasets, it is common to randomly assign the samples across a pool of native Arabic speakers. Recent analyses recommended routing dialectal samples to native speakers of their respective dialects to build higher-quality datasets. However, automatically identifying the dialect of samples is hard. Moreover, the pool of annotators who are native speakers of specific Arabic dialects might be scarce. Arabic Level of Dialectness (ALDi) was recently introduced as a quantitative variable that measures how sentences diverge from Standard Arabic. On randomly assigning samples to annotators, we hypothesize that samples of higher ALDi scores are harder to label especially if they are written in dialects that the annotators do not speak. We test this by analyzing the relation between ALDi scores and the annotators’ agreement, on 15 public datasets having raw individual sample annotations for various sentence-classification tasks. We find strong evidence supporting our hypothesis for 11 of them. Consequently, we recommend prioritizing routing samples of high ALDi scores to native speakers of each sample’s dialect, for which the dialect could be automatically identified at higher accuracies.
2023
Proceedings of ArabicNLP 2023
Hassan Sawaf | Samhaa El-Beltagy | Wajdi Zaghouani | Walid Magdy | Ahmed Abdelali | Nadi Tomeh | Ibrahim Abu Farha | Nizar Habash | Salam Khalifa | Amr Keleg | Hatem Haddad | Imed Zitouni | Khalil Mrini | Rawan Almatham
Proceedings of ArabicNLP 2023
Hassan Sawaf | Samhaa El-Beltagy | Wajdi Zaghouani | Walid Magdy | Ahmed Abdelali | Nadi Tomeh | Ibrahim Abu Farha | Nizar Habash | Salam Khalifa | Amr Keleg | Hatem Haddad | Imed Zitouni | Khalil Mrini | Rawan Almatham
Proceedings of ArabicNLP 2023
ALDi: Quantifying the Arabic Level of Dialectness of Text
Amr Keleg | Sharon Goldwater | Walid Magdy
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Amr Keleg | Sharon Goldwater | Walid Magdy
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Transcribed speech and user-generated text in Arabic typically contain a mixture of Modern Standard Arabic (MSA), the standardized language taught in schools, and Dialectal Arabic (DA), used in daily communications. To handle this variation, previous work in Arabic NLP has focused on Dialect Identification (DI) on the sentence or the token level. However, DI treats the task as binary, whereas we argue that Arabic speakers perceive a spectrum of dialectness, which we operationalize at the sentence level as the Arabic Level of Dialectness (ALDi), a continuous linguistic variable. We introduce the AOC-ALDi dataset (derived from the AOC dataset), containing 127,835 sentences (17% from news articles and 83% from user comments on those articles) which are manually labeled with their level of dialectness. We provide a detailed analysis of AOC-ALDi and show that a model trained on it can effectively identify levels of dialectness on a range of other corpora (including dialects and genres not included in AOC-ALDi), providing a more nuanced picture than traditional DI systems. Through case studies, we illustrate how ALDi can reveal Arabic speakers’ stylistic choices in different situations, a useful property for sociolinguistic analyses.
DLAMA: A Framework for Curating Culturally Diverse Facts for Probing the Knowledge of Pretrained Language Models
Amr Keleg | Walid Magdy
Findings of the Association for Computational Linguistics: ACL 2023
Amr Keleg | Walid Magdy
Findings of the Association for Computational Linguistics: ACL 2023
A few benchmarking datasets have been released to evaluate the factual knowledge of pretrained language models. These benchmarks (e.g., LAMA, and ParaRel) are mainly developed in English and later are translated to form new multilingual versions (e.g., mLAMA, and mParaRel). Results on these multilingual benchmarks suggest that using English prompts to recall the facts from multilingual models usually yields significantly better and more consistent performance than using non-English prompts. Our analysis shows that mLAMA is biased toward facts from Western countries, which might affect the fairness of probing models. We propose a new framework for curating factual triples from Wikidata that are culturally diverse. A new benchmark DLAMA-v1 is built of factual triples from three pairs of contrasting cultures having a total of 78,259 triples from 20 relation predicates. The three pairs comprise facts representing the (Arab and Western), (Asian and Western), and (South American and Western) countries respectively. Having a more balanced benchmark (DLAMA-v1) supports that mBERT performs better on Western facts than non-Western ones, while monolingual Arabic, English, and Korean models tend to perform better on their culturally proximate facts. Moreover, both monolingual and multilingual models tend to make a prediction that is culturally or geographically relevant to the correct label, even if the prediction is wrong.
Arabic Dialect Identification under Scrutiny: Limitations of Single-label Classification
Amr Keleg | Walid Magdy
Proceedings of ArabicNLP 2023
Amr Keleg | Walid Magdy
Proceedings of ArabicNLP 2023
Automatic Arabic Dialect Identification (ADI) of text has gained great popularity since it was introduced in the early 2010s. Multiple datasets were developed, and yearly shared tasks have been running since 2018. However, ADI systems are reported to fail in distinguishing between the micro-dialects of Arabic. We argue that the currently adopted framing of the ADI task as a single-label classification problem is one of the main reasons for that. We highlight the limitation of the incompleteness of the Dialect labels and demonstrate how it impacts the evaluation of ADI systems. A manual error analysis for the predictions of an ADI, performed by 7 native speakers of different Arabic dialects, revealed that ≈ 66% of the validated errors are not true errors. Consequently, we propose framing ADI as a multi-label classification task and give recommendations for designing new ADI datasets.
2022
SMASH at Qur’an QA 2022: Creating Better Faithful Data Splits for Low-resourced Question Answering Scenarios
Amr Keleg | Walid Magdy
Proceedinsg of the 5th Workshop on Open-Source Arabic Corpora and Processing Tools with Shared Tasks on Qur'an QA and Fine-Grained Hate Speech Detection
Amr Keleg | Walid Magdy
Proceedinsg of the 5th Workshop on Open-Source Arabic Corpora and Processing Tools with Shared Tasks on Qur'an QA and Fine-Grained Hate Speech Detection
The Qur’an QA 2022 shared task aims at assessing the possibility of building systems that can extract answers to religious questions given relevant passages from the Holy Qur’an. This paper describes SMASH’s system that was used to participate in this shared task. Our experiments reveal a data leakage issue among the different splits of the dataset. This leakage problem hinders the reliability of using the models’ performance on the development dataset as a proxy for the ability of the models to generalize to new unseen samples. After creating better faithful splits from the original dataset, the basic strategy of fine-tuning a language model pretrained on classical Arabic text yielded the best performance on the new evaluation split. The results achieved by the model suggests that the small scale dataset is not enough to fine-tune large transformer-based language models in a way that generalizes well. Conversely, we believe that further attention could be paid to the type of questions that are being used to train the models given the sensitivity of the data.
Automatically Discarding Straplines to Improve Data Quality for Abstractive News Summarization
Amr Keleg | Matthias Lindemann | Danyang Liu | Wanqiu Long | Bonnie L. Webber
Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP
Amr Keleg | Matthias Lindemann | Danyang Liu | Wanqiu Long | Bonnie L. Webber
Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP
Recent improvements in automatic news summarization fundamentally rely on large corpora of news articles and their summaries. These corpora are often constructed by scraping news websites, which results in including not only summaries but also other kinds of texts. Apart from more generic noise, we identify straplines as a form of text scraped from news websites that commonly turn out not to be summaries. The presence of these non-summaries threatens the validity of scraped corpora as benchmarks for news summarization. We have annotated extracts from two news sources that form part of the Newsroom corpus (Grusky et al., 2018), labeling those which were straplines, those which were summaries, and those which were both. We present a rule-based strapline detection method that achieves good performance on a manually annotated test set. Automatic evaluation indicates that removing straplines and noise from the training data of a news summarizer results in higher quality summaries, with improvements as high as 7 points ROUGE score.
2020
An Unsupervised Method for Weighting Finite-state Morphological Analyzers
Amr Keleg | Francis M. Tyers | Nicholas Howell | Tommi A. Pirinen
Proceedings of the Twelfth Language Resources and Evaluation Conference
Amr Keleg | Francis M. Tyers | Nicholas Howell | Tommi A. Pirinen
Proceedings of the Twelfth Language Resources and Evaluation Conference
Morphological analysis is one of the tasks that have been studied for years. Different techniques have been used to develop models for performing morphological analysis. Models based on finite state transducers have proved to be more suitable for languages with low available resources. In this paper, we have developed a method for weighting a morphological analyzer built using finite state transducers in order to disambiguate its results. The method is based on a word2vec model that is trained in a completely unsupervised way using raw untagged corpora and is able to capture the semantic meaning of the words. Most of the methods used for disambiguating the results of a morphological analyzer relied on having tagged corpora that need to manually built. Additionally, the method developed uses information about the token irrespective of its context unlike most of the other techniques that heavily rely on the word’s context to disambiguate its set of candidate analyses.
ASU_OPTO at OSACT4 - Offensive Language Detection for Arabic text
Amr Keleg | Samhaa R. El-Beltagy | Mahmoud Khalil
Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection
Amr Keleg | Samhaa R. El-Beltagy | Mahmoud Khalil
Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection
In the past years, toxic comments and offensive speech are polluting the internet and manual inspection of these comments is becoming a tiresome task to manage. Having a machine learning based model that is able to filter offensive Arabic content is of high need nowadays. In this paper, we describe the model that was submitted to the Shared Task on Offensive Language Detection that is organized by (The 4th Workshop on Open-Source Arabic Corpora and Processing Tools). Our model makes use transformer based model (BERT) to detect offensive content. We came in the fourth place in subtask A (detecting Offensive Speech) and in the third place in subtask B (detecting Hate Speech).
Search
Fix author
Co-authors
- Walid Magdy 8
- Sharon Goldwater 3
- Samhaa R. El-Beltagy 2
- Nizar Habash 2
- Fajri Koto 2
- Preslav Nakov 2
- Ahmed Abdelali 1
- Muhammad Abdul-Mageed 1
- Idris Abdulmumin 1
- Ibrahim Abu Farha 1
- Abdulhamid Abubakar 1
- Faisal Muhammad Adam 1
- Esther Adenuga 1
- Amit Agarwal 1
- Cristina Aggazzotti 1
- Sarfraz Ahmad 1
- Raia Abu Ahmad 1
- Momina Ahsan 1
- Akshata 1
- Muhammad Dehan Al Kautsar 1
- Mohammad Rustom Al Nasar 1
- Hend Al-Khalifa 1
- Jesujoba Alabi 1
- Rawan Almatham 1
- Saeed Almheiri 1
- Reem Alqifari 1
- Azril Hafizi Amirudin 1
- Cynthia Jayne Amol 1
- Nicholas Andrews 1
- David Anugraha 1
- Michael Anugraha 1
- Mohamed Anwar 1
- Catherine Arnett 1
- Mithil Bangera 1
- Yeshil Bangera 1
- Houda Bouamor 1
- Weerayut Buaphet 1
- Laurie Burchell 1
- Lanwenn ar C'horr 1
- Hande Celikkanat 1
- Kranti Chalamalasetti 1
- Leshem Choshen 1
- Thibault Cl\'erice 1
- Rasul Dent 1
- Ryandito Diandaru 1
- Konstantin Dobler 1
- Karan Dua 1
- Omar El Herraoui 1
- Mohamed El Zeftawy 1
- Bilal Elbouardi 1
- Abdelrahim Elmadany 1
- Kareem Elzeky 1
- Biaoyan Fang 1
- Vicky Feliren 1
- Luca Foppiano 1
- Abed Alhakim Freihat 1
- Lea Frermann 1
- Dmitry Gaynullin 1
- Manuel Goul\~ao 1
- Tommaso Green 1
- Muhammad Ravi Shulthan Habibi 1
- Hatem Haddad 1
- Injy Hamed 1
- Nadia Ghezaiel Hammouda 1
- Ikhlasul Akmal Hanif 1
- Lara Hassan 1
- Nicholas Howell 1
- Nuhu Ibrahim 1
- Inshirah Idris 1
- Fenal Ashokbhai Ilasariya 1
- Joseph Marvin Imperial 1
- Yanbei Jiang 1
- Kun Kerdthaisong 1
- Ilker Kesen 1
- Jun Kevin 1
- Salam Khalifa 1
- Mahmoud Khalil 1
- Bruhan Kyomuhendo 1
- Jey Han Lau 1
- Yiyuan Li 1
- Junhong Liang 1
- Matthias Lindemann 1
- Danyang Liu 1
- Wanqiu Long 1
- Sarah K. K. Luger 1
- Jean Maillard 1
- Kamohelo Makaaka 1
- Vukosi Marivate 1
- Juan Pablo Martínez 1
- Ali Mekky 1
- Sara Hincapi\'e Monsalve 1
- Rafael Mosquera 1
- Khalil Mrini 1
- Carol Muchemi 1
- Shamsuddeen Hassan Muhammad 1
- Kenton Murray 1
- Ahmad Mustafid 1
- Casper Rufaro Muziri 1
- Hamada Nayel 1
- Khang Nguyen 1
- My Chiffon Nguyen 1
- Melika Nobakhtian 1
- Shu Okabe 1
- Pedro Ortiz Suarez 1
- Malte Ostendorff 1
- Verrah Akinyi Otiende 1
- Quentin Pag\`es 1
- Srikant Panda 1
- Hitesh Laxmichand Patel 1
- Flammie A. Pirinen 1
- Vallerie Alexandra Putra 1
- Ingrid Gabriela Franco Ramirez 1
- Benjamin L Rice 1
- Mattes Ruckdeschel 1
- Daniel Ruffinelli 1
- Benoît Sagot 1
- Luis Frentzen Salim 1
- Younes Samih 1
- Saron Samuel 1
- Hassan Sawaf 1
- Jakhongir Saydaliev 1
- Pavel Stepachev 1
- Damian Stewart 1
- Sotaro Takeshita 1
- Filbert Aurelian Tjiaranata 1
- Nadi Tomeh 1
- Atnafu Lambebo Tonja 1
- Yassine Toughrai 1
- Francis Tyers 1
- Gouthami Vadithya 1
- Sowmya Vajjala 1
- Rob Van Der Goot 1
- Thom Vaughan 1
- Ahmad Mustapha Wali 1
- Azmine Toushik Wasi 1
- Bonnie Webber 1
- Genta Indra Winata 1
- Tack Hwa Wong 1
- Zhuohan Xie 1
- Andrew Yates 1
- Seid Muhie Yimam 1
- Wajdi Zaghouani 1
- Chiyu Zhang 1
- Mike Zhang 1
- Ej Zhou 1
- Imed Zitouni 1