Arbiters of Ambivalence: Challenges of using LLMs in No-Consensus tasks

Bhaktipriya Radharapu; Manon Revel; Megan Ung; Sebastian Ruder; Adina Williams

Arbiters of Ambivalence: Challenges of using LLMs in No-Consensus tasks

Bhaktipriya Radharapu, Manon Revel, Megan Ung, Sebastian Ruder, Adina Williams

Abstract

The increasing use of LLMs as substitutes for humans in “aligning” LLMs has raised questions about their ability to replicate human judgments and preferences, especially in ambivalent scenarios where humans disagree. This study examines the biases and limitations of LLMs in three roles: answer generator, judge, and debater. These roles loosely correspond to previously described alignment frameworks: preference alignment (judge) and scalable oversight (debater), with the answer generator reflecting the typical setting with user interactions. We develop a “no-consensus” benchmark by curating examples that encompass a variety of a priori ambivalent scenarios, each presenting two possible stances. Our results show that while LLMs can provide nuanced assessments when generating open-ended answers, they tend to take a stance on no-consensus topics when employed as judges or debaters. These findings underscore the necessity for more sophisticated methods for aligning LLMs without human oversight, highlighting that LLMs cannot fully capture human non-agreement even on topics where humans themselves are divided.

Anthology ID:: 2025.findings-acl.243
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 4677–4731
Language:
URL:: https://preview.aclanthology.org/landing_page/2025.findings-acl.243/
DOI:
Bibkey:
Cite (ACL):: Bhaktipriya Radharapu, Manon Revel, Megan Ung, Sebastian Ruder, and Adina Williams. 2025. Arbiters of Ambivalence: Challenges of using LLMs in No-Consensus tasks. In Findings of the Association for Computational Linguistics: ACL 2025, pages 4677–4731, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Arbiters of Ambivalence: Challenges of using LLMs in No-Consensus tasks (Radharapu et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2025.findings-acl.243.pdf

PDF Cite Search Fix data