Abstract
The fairness and trustworthiness of Large Language Models (LLMs) are receiving increasing attention. Implicit hate speech, which employs indirect language to convey hateful intentions, occupies a significant portion of practice. However, the extent to which LLMs effectively address this issue remains insufficiently examined. This paper delves into the capability of LLMs to detect implicit hate speech and express confidence in their responses. Our evaluation meticulously considers various prompt patterns and mainstream uncertainty estimation methods. Our findings highlight that LLMs exhibit two extremes: (1) LLMs display excessive sensitivity towards groups or topics that may cause fairness issues, resulting in misclassifying benign statements as hate speech. (2) LLMs’ confidence scores for each method excessively concentrate on a fixed range, remaining unchanged regardless of the dataset’s complexity. Consequently, the calibration performance is heavily reliant on primary classification accuracy. These discoveries unveil new limitations of LLMs, underscoring the need for caution when optimizing models to ensure they do not veer towards extremes. This serves as a reminder to carefully consider sensitivity and confidence in the pursuit of model fairness.- Anthology ID:
- 2024.acl-long.652
- Volume:
- Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- August
- Year:
- 2024
- Address:
- Bangkok, Thailand
- Editors:
- Lun-Wei Ku, Andre Martins, Vivek Srikumar
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 12073–12086
- Language:
- URL:
- https://aclanthology.org/2024.acl-long.652
- DOI:
- 10.18653/v1/2024.acl-long.652
- Cite (ACL):
- Min Zhang, Jianfeng He, Taoran Ji, and Chang-Tien Lu. 2024. Don’t Go To Extremes: Revealing the Excessive Sensitivity and Calibration Limitations of LLMs in Implicit Hate Speech Detection. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 12073–12086, Bangkok, Thailand. Association for Computational Linguistics.
- Cite (Informal):
- Don’t Go To Extremes: Revealing the Excessive Sensitivity and Calibration Limitations of LLMs in Implicit Hate Speech Detection (Zhang et al., ACL 2024)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2024.acl-long.652.pdf