Abstract
Large Language Models (LLMs) have shown superb abilities to generate texts that are indistinguishable from human-generated texts in many cases. However, sometimes they generate false, incorrect, or misleading content, which is often described as “hallucinations”. Quantifying and analyzing hallucination in LLMs can increase their reliability and usage. While hallucination is being actively studied for English and other languages, and different benchmarking datsets have been created, this area is not studied at all for Arabic. In our paper, we create the first Arabic dataset that contains 10K of generated sentences by LLMs and annotate it for factuality and correctness. We provide detailed analysis of the dataset to analyze factual and linguistic errors. We found that 25% of the generated sentences are factually incorrect. We share the dataset with the research community.- Anthology ID:
- 2024.lrec-main.705
- Volume:
- Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
- Venues:
- LREC | COLING
- SIG:
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 8008–8015
- Language:
- URL:
- https://aclanthology.org/2024.lrec-main.705
- DOI:
- Cite (ACL):
- Hamdy Mubarak, Hend Al-Khalifa, and Khaloud Suliman Alkhalefah. 2024. Halwasa: Quantify and Analyze Hallucinations in Large Language Models: Arabic as a Case Study. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 8008–8015, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- Halwasa: Quantify and Analyze Hallucinations in Large Language Models: Arabic as a Case Study (Mubarak et al., LREC-COLING 2024)
- PDF:
- https://preview.aclanthology.org/landing_page/2024.lrec-main.705.pdf