Generative AI for Technical Writing: Comparing Human and LLM Assessments of Generated Content

Karen de Souza; Alexandre Nikolaev; Maarit Koponen

Generative AI for Technical Writing: Comparing Human and LLM Assessments of Generated Content

Karen de Souza, Alexandre Nikolaev, Maarit Koponen

Abstract

Large language models (LLMs) have recently gained significant attention for their capabilities in natural language processing (NLP), particularly generative artificial intelligence (AI). LLMs can also be useful tools for software documentation technical writers. We present an assessment of technical documentation content generated by three different LLMs using retrieval-augmented technology (RAG) with product documentation as a knowledge base. The LLM-generated responses were analyzed in three ways: 1) manual error analysis by a technical writer, 2) automatic assessment using deterministic metrics (BLEU, ROUGE, token overlap), and 3) evaluation of correctness by LLM as a judge. The results of these assessments were compared using a Network Analysis and linear regression models to investigate statistical relationships, model preferences, and the distribution of human and LLM scores. The analyses concluded that human quality evaluation is more related to the LLM correctness judgment than deterministic metrics, even when using different analysis frameworks.

Anthology ID:: 2025.nodalida-1.67
Volume:: Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025)
Month:: march
Year:: 2025
Address:: Tallinn, Estonia
Editors:: Richard Johansson, Sara Stymne
Venue:: NoDaLiDa
SIG:
Publisher:: University of Tartu Library
Note:
Pages:: 661–679
Language:
URL:: https://preview.aclanthology.org/corrections-2025-07/2025.nodalida-1.67/
DOI:
Bibkey:
Cite (ACL):: Karen de Souza, Alexandre Nikolaev, and Maarit Koponen. 2025. Generative AI for Technical Writing: Comparing Human and LLM Assessments of Generated Content. In Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025), pages 661–679, Tallinn, Estonia. University of Tartu Library.
Cite (Informal):: Generative AI for Technical Writing: Comparing Human and LLM Assessments of Generated Content (Souza et al., NoDaLiDa 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/corrections-2025-07/2025.nodalida-1.67.pdf

PDF Cite Search Fix data