Aggregating Crowd of LLMs for Cost-Effective Data Annotation

Jiacheng Liu; Xiaofeng Hou

Aggregating Crowd of LLMs for Cost-Effective Data Annotation

Abstract

Recent advancements in Large Language Models (LLMs) have shown promise for automated data annotation, yet reliance on expensive commercial models like GPT-4 limits accessibility. This paper rigorously evaluates the potential of open-source smaller LLMs (sLLMs) as a cost-effective alternative. We introduce a new benchmark dataset, Multidisciplinary Open Research Data (MORD), comprising 12,277 annotated sentence segments from 1,500 schoolarly articles across five research domains, to systematically assess sLLM performance. Our experiments demonstrate that sLLMs achieve annotation quality surpassing Amazon MTurk workers and approach GPT-4’s accuracy at significantly lower costs. We further propose to build the Crowd of LLMs, which aggregates annotations from multiple sLLMs using label aggregation algorithms. This approach not only outperforms individual sLLMs but also reveals that combining sLLM annotations with human crowd labels yields superior results compared to either method alone. Our findings highlight the viability of sLLMs for democratizing high-quality data annotation while underscoring the need for tailored aggregation methods to fully realize their potential.

Anthology ID:: 2026.findings-eacl.125
Volume:: Findings of the Association for Computational Linguistics: EACL 2026
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2407–2419
Language:
URL:: https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.125/
DOI:
Bibkey:
Cite (ACL):: Jiacheng Liu and Xiaofeng Hou. 2026. Aggregating Crowd of LLMs for Cost-Effective Data Annotation. In Findings of the Association for Computational Linguistics: EACL 2026, pages 2407–2419, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: Aggregating Crowd of LLMs for Cost-Effective Data Annotation (Liu & Hou, Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.125.pdf
Checklist:: 2026.findings-eacl.125.checklist.pdf

PDF Cite Search Checklist Fix data