Arabic Mini-ClimateGPT : A Climate Change and Sustainability Tailored Arabic LLM
Sahal Mullappilly, Abdelrahman Shaker, Omkar Thawakar, Hisham Cholakkal, Rao Anwer, Salman Khan, Fahad Khan
Abstract
Climate change is one of the most significant challenges we face together as a society. Creating awareness and educating policy makers the wide-ranging impact of climate change is an essential step towards a sustainable future. Recently, Large Language Models (LLMs) like ChatGPT and Bard have shown impressive conversational abilities and excel in a wide variety of NLP tasks. While these models are close-source, recently alternative open-source LLMs such as Stanford Alpaca and Vicuna have shown promising results. However, these open-source models are not specifically tailored for climate related domain specific information and also struggle to generate meaningful responses in other languages such as, Arabic. To this end, we propose a light-weight Arabic Mini-ClimateGPT that is built on an open-source LLM and is specifically fine-tuned on a conversational-style instruction tuning curated Arabic dataset Clima500-Instruct with over 500k instructions about climate change and sustainability. Further, our model also utilizes a vector embedding based retrieval mechanism during inference. We validate our proposed model through quantitative and qualitative evaluations on climate-related queries. Our model surpasses the baseline LLM in 88.3% of cases during ChatGPT-based evaluation. Furthermore, our human expert evaluation reveals an 81.6% preference for our model’s responses over multiple popular open-source models. Our open-source demos, models and curated instruction sets are available here : https://github.com/mbzuai-oryx/ClimateGPT- Anthology ID:
- 2023.findings-emnlp.941
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2023
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Houda Bouamor, Juan Pino, Kalika Bali
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 14126–14136
- Language:
- URL:
- https://aclanthology.org/2023.findings-emnlp.941
- DOI:
- 10.18653/v1/2023.findings-emnlp.941
- Cite (ACL):
- Sahal Mullappilly, Abdelrahman Shaker, Omkar Thawakar, Hisham Cholakkal, Rao Anwer, Salman Khan, and Fahad Khan. 2023. Arabic Mini-ClimateGPT : A Climate Change and Sustainability Tailored Arabic LLM. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 14126–14136, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Arabic Mini-ClimateGPT : A Climate Change and Sustainability Tailored Arabic LLM (Mullappilly et al., Findings 2023)
- PDF:
- https://preview.aclanthology.org/naacl24-info/2023.findings-emnlp.941.pdf