A Holistic Assessment of the Carbon Footprint of Noor, a Very Large Arabic Language Model

Imad Lakim, Ebtesam Almazrouei, Ibrahim Abualhaol, Merouane Debbah, Julien Launay


Abstract
As ever larger language models grow more ubiquitous, it is crucial to consider their environmental impact. Characterised by extreme size and resource use, recent generations of models have been criticised for their voracious appetite for compute, and thus significant carbon footprint. Although reporting of carbon impact has grown more common in machine learning papers, this reporting is usually limited to compute resources used strictly for training. In this work, we propose a holistic assessment of the footprint of an extreme-scale language model, Noor. Noor is an ongoing project aiming to develop the largest multi-task Arabic language models–with up to 13B parameters–leveraging zero-shot generalisation to enable a wide range of downstream tasks via natural language instructions. We assess the total carbon bill of the entire project: starting with data collection and storage costs, including research and development budgets, pretraining costs, future serving estimates, and other exogenous costs necessary for this international cooperation. Notably, we find that inference costs and exogenous factors can have a significant impact on total budget. Finally, we discuss pathways to reduce the carbon footprint of extreme-scale models.
Anthology ID:
2022.bigscience-1.8
Volume:
Proceedings of BigScience Episode #5 -- Workshop on Challenges & Perspectives in Creating Large Language Models
Month:
May
Year:
2022
Address:
virtual+Dublin
Venue:
BigScience
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
84–94
Language:
URL:
https://aclanthology.org/2022.bigscience-1.8
DOI:
10.18653/v1/2022.bigscience-1.8
Bibkey:
Cite (ACL):
Imad Lakim, Ebtesam Almazrouei, Ibrahim Abualhaol, Merouane Debbah, and Julien Launay. 2022. A Holistic Assessment of the Carbon Footprint of Noor, a Very Large Arabic Language Model. In Proceedings of BigScience Episode #5 -- Workshop on Challenges & Perspectives in Creating Large Language Models, pages 84–94, virtual+Dublin. Association for Computational Linguistics.
Cite (Informal):
A Holistic Assessment of the Carbon Footprint of Noor, a Very Large Arabic Language Model (Lakim et al., BigScience 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2022.bigscience-1.8.pdf
Video:
 https://preview.aclanthology.org/emnlp-22-attachments/2022.bigscience-1.8.mp4
Data
CCNet