PeerCheck: Enhancing LLM-Generated Academic Reviews Towards Human-Level Quality
Zeyuan Chen, Ziqing Yang, Yihan Ma, Michael Backes, Yang Zhang
Abstract
As academic submissions grow, the traditional peer review process struggles to keep up, raising concerns about quality and fairness.A trend of using large language models (LLMs) for assistance has emerged.In this work, we take a critical step toward improving the quality of LLM-generated reviews.We propose the PeerCheck framework, which investigates LLM-human review differences (RQ1) and explores methods to increase LLM-human similarity (RQ2).We first analyzed the human-written reviews with reviews generated by GPT-4o, Claude-3.7-Sonnet, and DeepSeek-V3 and found that LLMs and humans focus on different terms, e.g., LLMs prioritize theory while humans emphasize methodology and experiments.We further adopt prompt engineering, such as Chain-of-Thought (CoT), and utilize retrieval-augmented generation (RAG) to enhance the LLM-generated reviews towards human-level quality.We find CoT significantly improves the human similarity of LLM reviews, while we also discover an unexpected “RAG paradox,” i.e., experiments with RAG produce different results for various LLMs and, in some cases, even reduce review quality.Our comprehensive analysis of LLM-generated academic reviews illustrates both possibilities and limitations, contributing to a more effective, human-aligned review system.- Anthology ID:
- 2026.findings-acl.1170
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 23362–23386
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1170/
- DOI:
- Cite (ACL):
- Zeyuan Chen, Ziqing Yang, Yihan Ma, Michael Backes, and Yang Zhang. 2026. PeerCheck: Enhancing LLM-Generated Academic Reviews Towards Human-Level Quality. In Findings of the Association for Computational Linguistics: ACL 2026, pages 23362–23386, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- PeerCheck: Enhancing LLM-Generated Academic Reviews Towards Human-Level Quality (Chen et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1170.pdf