Guilherme Penedo


2023

pdf
AlGhafa Evaluation Benchmark for Arabic Language Models
Ebtesam Almazrouei | Ruxandra Cojocaru | Michele Baldo | Quentin Malartic | Hamza Alobeidli | Daniele Mazzotta | Guilherme Penedo | Giulia Campesan | Mugariya Farooq | Maitha Alhammadi | Julien Launay | Badreddine Noune
Proceedings of ArabicNLP 2023

Recent advances in the space of Arabic large language models have opened up a wealth of potential practical applications. From optimal training strategies, large scale data acquisition and continuously increasing NLP resources, the Arabic LLM landscape has improved in a very short span of time, despite being plagued by training data scarcity and limited evaluation resources compared to English. In line with contributing towards this ever-growing field, we introduce AlGhafa, a new multiple-choice evaluation benchmark for Arabic LLMs. For showcasing purposes, we train a new suite of models, including a 14 billion parameter model, the largest monolingual Arabic decoder-only model to date. We use a collection of publicly available datasets, as well as a newly introduced HandMade dataset consisting of 8 billion tokens. Finally, we explore the quantitative and qualitative toxicity of several Arabic models, comparing our models to existing public Arabic LLMs.