SycoBench-600: Measuring Sycophancy and Correction Selectivity in LLM Assistants

Debu Sinha

SycoBench-600: Measuring Sycophancy and Correction Selectivity in LLM Assistants

Abstract

Modern instruction-following language models are optimized to be helpful and cooperative, often through preference-based alignment such as RLHF and related methods. A growing body of evidence shows that this training can also induce sycophancy: models may agree with a user even when the user is wrong, undermining reliability in decision support and high-stakes advice. We introduce SycoBench-600, a controlled multiple-choice benchmark that measures (i) susceptibility to three social-pressure perturbations (doubt, authority, and an explicit wrong suggestion) and (ii) correction selectivity, the ability to accept correct suggestions while resisting incorrect ones. The released benchmark contains 600 English MCQ instances over 272 normalized question stems, covers 8 domains and 3 difficulty tiers, and evaluates each instance under 3 fixed paraphrase variants of the perturbation prompts. We evaluate seven widely used assistants spanning proprietary and open-weight families. Results show substantial variation in pressure robustness and selective updating, and further show that willingness to update does not by itself imply selectivity. We release raw logs, validation scripts, and code that regenerates every table and figure from the model outputs.

Anthology ID:: 2026.findings-acl.1759
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 35278–35284
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1759/
DOI:
Bibkey:
Cite (ACL):: Debu Sinha. 2026. SycoBench-600: Measuring Sycophancy and Correction Selectivity in LLM Assistants. In Findings of the Association for Computational Linguistics: ACL 2026, pages 35278–35284, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: SycoBench-600: Measuring Sycophancy and Correction Selectivity in LLM Assistants (Sinha, Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1759.pdf
Checklist:: 2026.findings-acl.1759.checklist.pdf

PDF Cite Search Checklist Fix data