Evaluating the Impact of Reviewer Guideline Design on LLM-Based Automated Peer Review

Haowen Li, Yoichi Ishibashi, Masafumi Oyamada


Abstract
Peer review is an essential process in scientific research, yet the growing workload has made its automation increasingly necessary. In this study, we analyze how different types of reviewer guidelines, such as official conference guidelines and reviewer-imitating ones distilled from high-quality human reviews, affect automated peer review. Our experiments show that official conference guidelines produce review results most consistent with human judgments, suggesting that evaluation criteria refined through conference practice serve as effective guidance for automated reviewing as well. In contrast, reviewer-imitating guidelines, especially those enforcing strict rubric-style scoring, consistently degraded automated review performance, highlighting the importance of allowing subjective and holistic scoring.
Anthology ID:
2026.findings-acl.1511
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
30223–30240
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1511/
DOI:
Bibkey:
Cite (ACL):
Haowen Li, Yoichi Ishibashi, and Masafumi Oyamada. 2026. Evaluating the Impact of Reviewer Guideline Design on LLM-Based Automated Peer Review. In Findings of the Association for Computational Linguistics: ACL 2026, pages 30223–30240, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Evaluating the Impact of Reviewer Guideline Design on LLM-Based Automated Peer Review (Li et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1511.pdf
Checklist:
 2026.findings-acl.1511.checklist.pdf