Eliciting Uncertainty in Chain-of-Thought to Mitigate Bias against Forecasting Harmful User Behaviors

Anthony Sicilia; Malihe Alikhani

doi:10.18653/v1/2024.nlp4pi-1.19

Eliciting Uncertainty in Chain-of-Thought to Mitigate Bias against Forecasting Harmful User Behaviors

Abstract

Conversation forecasting tasks a model with predicting the outcome of an unfolding conversation. For instance, it can be applied in social media moderation to predict harmful user behaviors before they occur, allowing for preventative interventions. While large language models (LLMs) have recently been proposed as an effective tool for conversation forecasting, it’s unclear what biases they may have, especially against forecasting the (potentially harmful) outcomes we request them to predict during moderation. This paper explores to what extent model uncertainty can be used as a tool to mitigate potential biases. Specifically, we ask three primary research questions: 1) how does LLM forecasting accuracy change when we ask models to represent their uncertainty; 2) how does LLM bias change when we ask models to represent their uncertainty; 3) how can we use uncertainty representations to reduce or completely mitigate biases without many training data points. We address these questions for 5 open-source language models tested on 2 datasets designed to evaluate conversation forecasting for social media moderation.

Anthology ID:: 2024.nlp4pi-1.19
Volume:: Proceedings of the Third Workshop on NLP for Positive Impact
Month:: November
Year:: 2024
Address:: Miami, Florida, USA
Editors:: Daryna Dementieva, Oana Ignat, Zhijing Jin, Rada Mihalcea, Giorgio Piatti, Joel Tetreault, Steven Wilson, Jieyu Zhao
Venues:: NLP4PI | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 211–223
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2024.nlp4pi-1.19/
DOI:: 10.18653/v1/2024.nlp4pi-1.19
Bibkey:
Cite (ACL):: Anthony Sicilia and Malihe Alikhani. 2024. Eliciting Uncertainty in Chain-of-Thought to Mitigate Bias against Forecasting Harmful User Behaviors. In Proceedings of the Third Workshop on NLP for Positive Impact, pages 211–223, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):: Eliciting Uncertainty in Chain-of-Thought to Mitigate Bias against Forecasting Harmful User Behaviors (Sicilia & Alikhani, NLP4PI 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2024.nlp4pi-1.19.pdf

PDF Cite Search Fix data