Think Again! The Effect of Test-Time Compute on Preferences, Opinions, and Beliefs of Large Language Models

George Kour, Itay Nakash, Michal Shmueli-Scheuer, Ateret Anaby Tavor


Abstract
As Large Language Models (LLMs) become deeply integrated into human life and increasingly influence decision-making, it’s crucial to evaluate whether and to what extent they exhibit subjective preferences, opinions, and beliefs. These tendencies may stem from biases within the models, which may shape their behavior, influence the advice and recommendations they offer to users, and potentially reinforce certain viewpoints. This paper presents the Preference, Opinion, and Belief survey (POBs), a benchmark developed to assess LLMs’ subjective inclinations across societal, cultural, ethical, and personal domains. We applied our benchmark to evaluate leading open- and closed-source LLMs, measuring desired properties such as reliability, neutrality, and consistency. In addition, we investigated the effect of increasing the test-time compute, through reasoning and self-reflection mechanisms, on those metrics. While effective in other tasks, our results show that these mechanisms offer only limited gains in our domain. Furthermore, we reveal that newer model versions are becoming less consistent and more biased toward specific viewpoints, highlighting a blind spot and a concerning trend.POBS: https://ibm.github.io/POBS
Anthology ID:
2025.acl-industry.45
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Georg Rehm, Yunyao Li
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
639–660
Language:
URL:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.acl-industry.45/
DOI:
Bibkey:
Cite (ACL):
George Kour, Itay Nakash, Michal Shmueli-Scheuer, and Ateret Anaby Tavor. 2025. Think Again! The Effect of Test-Time Compute on Preferences, Opinions, and Beliefs of Large Language Models. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track), pages 639–660, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
Think Again! The Effect of Test-Time Compute on Preferences, Opinions, and Beliefs of Large Language Models (Kour et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/acl25-workshop-ingestion/2025.acl-industry.45.pdf