Benjamin Evans
2026
Large Language Models Are Effective Human Annotation Assistants, But Not Good Independent Annotators
Feng Gu | Zongxia Li | Carlos R. Colon | Benjamin Evans | Ishani Mondal | Jordan Lee Boyd-Graber
Findings of the Association for Computational Linguistics: ACL 2026
Feng Gu | Zongxia Li | Carlos R. Colon | Benjamin Evans | Ishani Mondal | Jordan Lee Boyd-Graber
Findings of the Association for Computational Linguistics: ACL 2026
Event annotation is important for identifying, monitoring, and understanding sociological trends. Although expert annotators set the gold standard, they are expensive and inefficient. While state-of-the-art NLP models are an attractive alternative, they are often evaluated on standalone subtasks rather than entire workflows. Thus, we evaluate a holistic workflow that summarizes news with event coreference resolution and argument extraction in three modes: AI-only, AI assistance, and human only. Although AI’s recall is seven times higher than the tf-idf baseline at coreference resolution, it is far from replacing experts. However, experts adopt AI-extracted arguments 60% of the time, reducing extraction time by 25%. Our code and data are in https://github.com/Obertura777/gtd-data.