LLMs as annotators of argumentation

Anna Lindahl


Abstract
Annotated data is essential for most NLP tasks, but creating it can be time-consuming and challenging. Argumentation annotation is especially complex, often resulting in moderate human agreement. While large language models (LLMs) have excelled in increasingly complex tasks, their application to argumentation annotation has been limited. This paper investigates how well GPT-4o and Claude can annotate three types of argumentation in Swedish data compared to human annotators. Using full annotation guidelines, we evaluate the models on argumentation schemes, argumentative spans, and attitude annotation. Both models perform similarly to humans across all tasks, with Claude showing better human agreement than GPT-4o. Agreement between models is higher than human agreement in argumentation scheme and span annotation.
Anthology ID:
2025.starsem-1.19
Volume:
Proceedings of the 14th Joint Conference on Lexical and Computational Semantics (*SEM 2025)
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Lea Frermann, Mark Stevenson
Venue:
*SEM
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
242–252
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.starsem-1.19/
DOI:
Bibkey:
Cite (ACL):
Anna Lindahl. 2025. LLMs as annotators of argumentation. In Proceedings of the 14th Joint Conference on Lexical and Computational Semantics (*SEM 2025), pages 242–252, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
LLMs as annotators of argumentation (Lindahl, *SEM 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.starsem-1.19.pdf