Arabic Mini-ClimateGPT : A Climate Change and Sustainability Tailored Arabic LLM

Sahal Mullappilly, Abdelrahman Shaker, Omkar Thawakar, Hisham Cholakkal, Rao Anwer, Salman Khan, Fahad Khan


Abstract
Climate change is one of the most significant challenges we face together as a society. Creating awareness and educating policy makers the wide-ranging impact of climate change is an essential step towards a sustainable future. Recently, Large Language Models (LLMs) like ChatGPT and Bard have shown impressive conversational abilities and excel in a wide variety of NLP tasks. While these models are close-source, recently alternative open-source LLMs such as Stanford Alpaca and Vicuna have shown promising results. However, these open-source models are not specifically tailored for climate related domain specific information and also struggle to generate meaningful responses in other languages such as, Arabic. To this end, we propose a light-weight Arabic Mini-ClimateGPT that is built on an open-source LLM and is specifically fine-tuned on a conversational-style instruction tuning curated Arabic dataset Clima500-Instruct with over 500k instructions about climate change and sustainability. Further, our model also utilizes a vector embedding based retrieval mechanism during inference. We validate our proposed model through quantitative and qualitative evaluations on climate-related queries. Our model surpasses the baseline LLM in 88.3% of cases during ChatGPT-based evaluation. Furthermore, our human expert evaluation reveals an 81.6% preference for our model’s responses over multiple popular open-source models. Our open-source demos, models and curated instruction sets are available here : https://github.com/mbzuai-oryx/ClimateGPT
Anthology ID:
2023.findings-emnlp.941
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2023
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
14126–14136
Language:
URL:
https://aclanthology.org/2023.findings-emnlp.941
DOI:
10.18653/v1/2023.findings-emnlp.941
Bibkey:
Cite (ACL):
Sahal Mullappilly, Abdelrahman Shaker, Omkar Thawakar, Hisham Cholakkal, Rao Anwer, Salman Khan, and Fahad Khan. 2023. Arabic Mini-ClimateGPT : A Climate Change and Sustainability Tailored Arabic LLM. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 14126–14136, Singapore. Association for Computational Linguistics.
Cite (Informal):
Arabic Mini-ClimateGPT : A Climate Change and Sustainability Tailored Arabic LLM (Mullappilly et al., Findings 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl24-info/2023.findings-emnlp.941.pdf