Ingo Weber
2025
Between the Drafts: An Evaluation Framework for Identifying Quality Improvement and Stylistic Differences in Scientific Texts
Danqing Chen
|
Ingo Weber
|
Felix Dietrich
Proceedings of the 5th Workshop on Evaluation and Comparison of NLP Systems
This study explores the potential of a lightweight, open-source Large Language Model (LLM), demonstrating how its integration with Retrieval-Augmented Generation (RAG) can support cost-effective evaluation of revision quality and writing style differentiation. By retrieving reference documents from a carefully chosen and constructed corpus of peer-reviewed conference proceedings, our framework leverages few-shot in-context learning to track manuscript revisions and venue-specific writing styles. We demonstrate that the LLM-based evaluation aligns closely with human revision histories—consistently recognizing quality improvements across revision stages and distinguishing writing styles associated with different conference venues. These findings highlight how a carefully designed evaluation framework, integrated with adequate, representative data, can advance automated assessment of scientific writing.