Can AI Revise Research Papers with Human Review Feedback? An Empirical Study and Benchmark

Zihan Luo; Hong Huang; Jianxun Lian; Yu Chang; Xing Xie; Hai Jin

Can AI Revise Research Papers with Human Review Feedback? An Empirical Study and Benchmark

Zihan Luo, Hong Huang, Jianxun Lian, Yu Chang, Xing Xie, Hai Jin

Abstract

The rise of Human-AI collaboration can effectively speed up the research process for experts and allow anyone with critical thinking skills to conduct innovative work. A key part of this collaboration is the AI’s ability to improve a paper with human feedback—updating both the text and experiments to meet high standards. To evaluate this skill, we introduce ReviseBench, an extensible benchmark built on real academic data that can be easily scaled via agent-driven automated data collection. It tests the skills of Large Language Models (LLMs) on paper interpretation, experimental implementation, and paper formulation, using authors’ camera-ready versions as natural human baselines. To facilitate a fine-grained assessment, we further propose ReviseArena, a platform supporting pair-wise comparisons between different AI-revised papers. Our initial evaluation results on ReviseBench reveal that even state-of-the-art foundation LLMs struggle significantly in this domain, achieving a win rate of less than 10% against human experts, and facing issues like incremental revision, unprofessional revision, and potential data fabrication. Our code and data are released publicly at: https://github.com/CGCL-codes/ReviseBench.

Anthology ID:: 2026.findings-acl.887
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 17876–17893
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.887/
DOI:
Bibkey:
Cite (ACL):: Zihan Luo, Hong Huang, Jianxun Lian, Yu Chang, Xing Xie, and Hai Jin. 2026. Can AI Revise Research Papers with Human Review Feedback? An Empirical Study and Benchmark. In Findings of the Association for Computational Linguistics: ACL 2026, pages 17876–17893, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Can AI Revise Research Papers with Human Review Feedback? An Empirical Study and Benchmark (Luo et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.887.pdf
Checklist:: 2026.findings-acl.887.checklist.pdf

PDF Cite Search Checklist Fix data