Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions

Angana Borah, Rada Mihalcea


Abstract
As Large Language Models (LLMs) continue to evolve, they are increasingly being employed in numerous studies to simulate societies and execute diverse social tasks. However, LLMs are susceptible to societal biases due to their exposure to human-generated data. Given that LLMs are being used to gain insights into various societal aspects, it is essential to mitigate these biases. To that end, our study investigates the presence of implicit gender biases in multi-agent LLM interactions and proposes two strategies to mitigate these biases. We begin by creating a dataset of scenarios where implicit gender biases might arise, and subsequently develop a metric to assess the presence of biases. Our empirical analysis reveals that LLMs generate outputs characterized by strong implicit bias associations (≥ ≈ 50% of the time). Furthermore, these biases tend to escalate following multi-agent interactions. To mitigate them, we propose two strategies: self-reflection with in-context examples (ICE); and supervised fine-tuning. Our research demonstrates that both methods effectively mitigate implicit biases, with the ensemble of fine-tuning and self-reflection proving to be the most successful.
Anthology ID:
2024.findings-emnlp.545
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2024
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9306–9326
Language:
URL:
https://preview.aclanthology.org/build-pipeline-with-new-library/2024.findings-emnlp.545/
DOI:
10.18653/v1/2024.findings-emnlp.545
Bibkey:
Cite (ACL):
Angana Borah and Rada Mihalcea. 2024. Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 9306–9326, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions (Borah & Mihalcea, Findings 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/build-pipeline-with-new-library/2024.findings-emnlp.545.pdf
Software:
 2024.findings-emnlp.545.software.zip
Data:
 2024.findings-emnlp.545.data.zip