Hope Schroeder
2025
Just Put a Human in the Loop? Investigating LLM-Assisted Annotation for Subjective Tasks
Hope Schroeder
|
Deb Roy
|
Jad Kabbara
Findings of the Association for Computational Linguistics: ACL 2025
LLM use in annotation is becoming widespread, and given LLMs’ overall promising performance and speed, putting humans in the loop to simply “review” LLM annotations can be tempting. In subjective tasks with multiple plausible answers, this can impact both evaluation of LLM performance, and analysis using these labels in a social science task downstream. In a pre-registered experiment with 350 unique annotators and 7,000 annotations across 4 conditions, 2 models, and 2 datasets, we find that presenting crowdworkers with LLM-generated annotation suggestions did not make them faster annotators, but did improve their self-reported confidence in the task. More importantly, annotators strongly took the LLM suggestions, significantly changing the label distribution compared to the baseline. We show that when these labels created with LLM assistance are used to evaluate LLM performance, reported model performance significantly increases. We show how changes in label distributions as a result of LLM assistance can affect conclusions drawn by analyzing even “human-approved” LLM-annotated datasets. We believe our work underlines the importance of understanding the impact of LLM-assisted annotation on subjective, qualitative tasks, on the creation of gold data for training and testing, and on the evaluation of NLP systems on subjective tasks.
2024
Fora: A corpus and framework for the study of facilitated dialogue
Hope Schroeder
|
Deb Roy
|
Jad Kabbara
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Facilitated dialogue is increasingly popular as a method of civic engagement and as a method for gathering social insight, but resources for its study are scant. We present Fora, a unique collection of annotated facilitated dialogues. We compile 262 facilitated conversations that were hosted with partner organizations seeking to engage their members and surface insights regarding issues like education, elections, and public health, primarily through the sharing of personal experience. Alongside this corpus of 39,911 speaker turns, we present a framework for the analysis of facilitated dialogue. We taxonomize key personal sharing behaviors and facilitation strategies in the corpus, annotate a 25% sample (10,000+ speaker turns) of the data accordingly, and evaluate and establish baselines on a number of tasks essential to the identification of these phenomena in dialogue. We describe the data, and relate facilitator behavior to turn-taking and participant sharing. We outline how this research can inform future work in understanding and improving facilitated dialogue, parsing spoken conversation, and improving the behavior of dialogue agents.