Confidently Wrong: Exploring the Calibration and Expression of (Un)Certainty of Large Language Models in a Multilingual Setting
Lea Krause, Wondimagegnhue Tufa, Selene Baez Santamaria, Angel Daza, Urja Khurana, Piek Vossen
Abstract
While the fluency and coherence of Large Language Models (LLMs) in text generation have seen significant improvements, their competency in generating appropriate expressions of uncertainty remains limited.Using a multilingual closed-book QA task and GPT-3.5, we explore how well LLMs are calibrated and express certainty across a diverse set of languages, including low-resource settings. Our results reveal strong performance in high-resource languages but a marked decline in performance in lower-resource languages. Across all, we observe an exaggerated expression of confidence in the model, which does not align with the correctness or likelihood of its responses. Our findings highlight the need for further research into accurate calibration of LLMs especially in a multilingual setting.- Anthology ID:
- 2023.mmnlg-1.1
- Volume:
- Proceedings of the Workshop on Multimodal, Multilingual Natural Language Generation and Multilingual WebNLG Challenge (MM-NLG 2023)
- Month:
- September
- Year:
- 2023
- Address:
- Prague, Czech Republic
- Editors:
- Albert Gatt, Claire Gardent, Liam Cripwell, Anya Belz, Claudia Borg, Aykut Erdem, Erkut Erdem
- Venues:
- MMNLG | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1–9
- Language:
- URL:
- https://preview.aclanthology.org/remove-affiliations/2023.mmnlg-1.1/
- DOI:
- Cite (ACL):
- Lea Krause, Wondimagegnhue Tufa, Selene Baez Santamaria, Angel Daza, Urja Khurana, and Piek Vossen. 2023. Confidently Wrong: Exploring the Calibration and Expression of (Un)Certainty of Large Language Models in a Multilingual Setting. In Proceedings of the Workshop on Multimodal, Multilingual Natural Language Generation and Multilingual WebNLG Challenge (MM-NLG 2023), pages 1–9, Prague, Czech Republic. Association for Computational Linguistics.
- Cite (Informal):
- Confidently Wrong: Exploring the Calibration and Expression of (Un)Certainty of Large Language Models in a Multilingual Setting (Krause et al., MMNLG 2023)
- PDF:
- https://preview.aclanthology.org/remove-affiliations/2023.mmnlg-1.1.pdf