Quantifying Metric and Model Agreement in Bias Evaluation of Large Language Models

Arash Asgari, Huan Wu, Amirreza Naziri, Mojtaba Kolahdouzi, Laleh Seyyed-Kalantari


Abstract
Bias evaluation in large language models (LLMs) uses many metrics and benchmarks, but lacks a systematic way to measure agreement across bias metrics and models. As a result, improvements observed under one metric may contradict another, and model rankings may reflect benchmark-specific artifacts rather than stable bias profiles. In this work, we introduce Metric Agreement Score (MeAS) and Model Agreement Score (MoAS), which quantify cross-metric and cross-model agreement in bias rankings, respectively. We apply these measures to eight LLMs, seven bias metrics, and nine corpora. Our results reveal disagreement among both metrics and models: Contrary to expectations, we find that metrics within the same category (generation-based and probabilistic) often behave independently of each other. For instance, HONEST shows independence with toxicity metrics, and the Context Association Test shows no correlation with Language Modeling Bias metric. At the model level, DeepSeek-family models invert bias rankings relative to most others, indicating that the model family strongly shapes specific bias profiles. These findings challenge the assumption that bias mitigation is universally transferable and highlight the need for agreement-aware evaluation.
Anthology ID:
2026.acl-long.769
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
16868–16933
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.769/
DOI:
Bibkey:
Cite (ACL):
Arash Asgari, Huan Wu, Amirreza Naziri, Mojtaba Kolahdouzi, and Laleh Seyyed-Kalantari. 2026. Quantifying Metric and Model Agreement in Bias Evaluation of Large Language Models. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 16868–16933, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Quantifying Metric and Model Agreement in Bias Evaluation of Large Language Models (Asgari et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-long.769.pdf
Checklist:
 2026.acl-long.769.checklist.pdf