Can Reasoning Help Large Language Models Capture Human Annotator Disagreement?

Jingwei Ni; Yu Fan; Vilém Zouhar; Donya Rooein; Alexander Miserlis Hoyle; Mrinmaya Sachan; Markus Leippold; Dirk Hovy; Elliott Ash

Can Reasoning Help Large Language Models Capture Human Annotator Disagreement?

Jingwei Ni, Yu Fan, Vilém Zouhar, Donya Rooein, Alexander Miserlis Hoyle, Mrinmaya Sachan, Markus Leippold, Dirk Hovy, Elliott Ash

Abstract

Variation in human annotation (i.e., disagreements) is common in NLP, often reflecting important information like task subjectivity and sample ambiguity. Modeling this variation is important for applications that are sensitive to such information. Although RLVR-style reasoning (Reinforcement Learning with Verifiable Rewards) has improved Large Language Model (LLM) performance on many tasks, it remains unclear whether such reasoning enables LLMs to capture informative variation in human annotation. In this work, we evaluate the influence of different reasoning settings on LLM disagreement modeling. We systematically evaluate each reasoning setting across model sizes, distribution expression methods, and steering methods, resulting in 60 experimental setups across 3 tasks. Surprisingly, our results show that RLVR-style reasoning degrades performance in disagreement modeling, while naive Chain-of-Thought (CoT) reasoning improves the performance of RLHF LLMs (RL from human feedback). These findings underscore the potential risk of replacing human annotators with reasoning LLMs, especially when disagreements are important.

Anthology ID:: 2026.eacl-long.3
Volume:: Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:: EACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 36–54
Language:
URL:: https://preview.aclanthology.org/ingest-eacl/2026.eacl-long.3/
DOI:
Bibkey:
Cite (ACL):: Jingwei Ni, Yu Fan, Vilém Zouhar, Donya Rooein, Alexander Miserlis Hoyle, Mrinmaya Sachan, Markus Leippold, Dirk Hovy, and Elliott Ash. 2026. Can Reasoning Help Large Language Models Capture Human Annotator Disagreement?. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 36–54, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Can Reasoning Help Large Language Models Capture Human Annotator Disagreement? (Ni et al., EACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-eacl/2026.eacl-long.3.pdf

PDF Cite Search Fix data