Uncertainty Estimation in Large Language Models to Support Biodiversity Conservation

Maria Mora-Cross, Saul Calderon-Ramirez


Abstract
Large Language Models (LLM) provide significant value in question answering (QA) scenarios and have practical application in complex decision-making contexts, such as biodiversity conservation. However, despite substantial performance improvements, they may still produce inaccurate outcomes. Consequently, incorporating uncertainty quantification alongside predictions is essential for mitigating the potential risks associated with their use. This study introduces an exploratory analysis of the application of Monte Carlo Dropout (MCD) and Expected Calibration Error (ECE) to assess the uncertainty of generative language models. To that end, we analyzed two publicly available language models (Falcon-7B and DistilGPT-2). Our findings suggest the viability of employing ECE as a metric to estimate uncertainty in generative LLM. The findings from this research contribute to a broader project aiming at facilitating free and open access to standardized and integrated data and services about Costa Rica’s biodiversity to support the development of science, education, and biodiversity conservation.
Anthology ID:
2024.naacl-industry.31
Volume:
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 6: Industry Track)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Yi Yang, Aida Davani, Avi Sil, Anoop Kumar
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
368–378
Language:
URL:
https://aclanthology.org/2024.naacl-industry.31
DOI:
Bibkey:
Cite (ACL):
Maria Mora-Cross and Saul Calderon-Ramirez. 2024. Uncertainty Estimation in Large Language Models to Support Biodiversity Conservation. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 6: Industry Track), pages 368–378, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
Uncertainty Estimation in Large Language Models to Support Biodiversity Conservation (Mora-Cross & Calderon-Ramirez, NAACL 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/jeptaln-2024-ingestion/2024.naacl-industry.31.pdf