PeerCheck: Enhancing LLM-Generated Academic Reviews Towards Human-Level Quality

Zeyuan Chen; Ziqing Yang; Yihan Ma; Michael Backes; Yang Zhang

PeerCheck: Enhancing LLM-Generated Academic Reviews Towards Human-Level Quality

Zeyuan Chen, Ziqing Yang, Yihan Ma, Michael Backes, Yang Zhang

Abstract

As academic submissions grow, the traditional peer review process struggles to keep up, raising concerns about quality and fairness.A trend of using large language models (LLMs) for assistance has emerged.In this work, we take a critical step toward improving the quality of LLM-generated reviews.We propose the PeerCheck framework, which investigates LLM-human review differences (RQ1) and explores methods to increase LLM-human similarity (RQ2).We first analyzed the human-written reviews with reviews generated by GPT-4o, Claude-3.7-Sonnet, and DeepSeek-V3 and found that LLMs and humans focus on different terms, e.g., LLMs prioritize theory while humans emphasize methodology and experiments.We further adopt prompt engineering, such as Chain-of-Thought (CoT), and utilize retrieval-augmented generation (RAG) to enhance the LLM-generated reviews towards human-level quality.We find CoT significantly improves the human similarity of LLM reviews, while we also discover an unexpected “RAG paradox,” i.e., experiments with RAG produce different results for various LLMs and, in some cases, even reduce review quality.Our comprehensive analysis of LLM-generated academic reviews illustrates both possibilities and limitations, contributing to a more effective, human-aligned review system.

Anthology ID:: 2026.findings-acl.1170
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 23362–23386
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1170/
DOI:
Bibkey:
Cite (ACL):: Zeyuan Chen, Ziqing Yang, Yihan Ma, Michael Backes, and Yang Zhang. 2026. PeerCheck: Enhancing LLM-Generated Academic Reviews Towards Human-Level Quality. In Findings of the Association for Computational Linguistics: ACL 2026, pages 23362–23386, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: PeerCheck: Enhancing LLM-Generated Academic Reviews Towards Human-Level Quality (Chen et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1170.pdf
Checklist:: 2026.findings-acl.1170.checklist.pdf

PDF Cite Search Checklist Fix data