TitleTrap: Probing Presentation Bias in LLM-Based Scientific Reviewing

Shurui Du


Abstract
Large language models (LLMs) are now used in scientific peer review, but their judgments can still be influenced by how information is presented. We study how the style of a paper’s title affects the way LLMs score scientific work. To control for content variation, we build the TitleTrap benchmark using abstracts generated by a language model for common research topics in computer vision and NLP. Each abstract is paired with three titles: a branded colon style, a plain descriptive style, and an interrogative style, while the abstract text remains fixed. We ask GPT-4o and Claude to review these title–abstract pairs under the same instructions. Our results show that title style alone can change the scores: branded titles often receive higher ratings, while interrogative titles sometimes lead to lower assessments of rigor. These findings reveal a presentation bias in LLM-based peer review and suggest the need for better methods to reduce such bias and support fairer automated evaluation.
Anthology ID:
2025.eval4nlp-1.10
Volume:
Proceedings of the 5th Workshop on Evaluation and Comparison of NLP Systems
Month:
December
Year:
2025
Address:
Mumbai, India
Editors:
Mousumi Akter, Tahiya Chowdhury, Steffen Eger, Christoph Leiter, Juri Opitz, Erion Çano
Venues:
Eval4NLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
119–125
Language:
URL:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.eval4nlp-1.10/
DOI:
Bibkey:
Cite (ACL):
Shurui Du. 2025. TitleTrap: Probing Presentation Bias in LLM-Based Scientific Reviewing. In Proceedings of the 5th Workshop on Evaluation and Comparison of NLP Systems, pages 119–125, Mumbai, India. Association for Computational Linguistics.
Cite (Informal):
TitleTrap: Probing Presentation Bias in LLM-Based Scientific Reviewing (Du, Eval4NLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.eval4nlp-1.10.pdf