CINEMETRIC: A Framework for Multi-Perspective Evaluation of Conversational Agents using Human-AI Collaboration

Vahid Sadiri Javadi, Zain Ul Abedin, Lucie Flek


Abstract
Despite advances in conversational systems, the evaluation of such systems remains a challenging problem. Current evaluation paradigms often rely on costly homogeneous human annotators or oversimplified automated metrics, leading to a critical gap in socially aligned conversational agents, where pluralistic values (i.e., acknowledging diverse human experiences) are essential to reflect the inherently subjective and contextual nature of dialogue quality. In this paper, we propose CINEMETRIC, a novel framework that operationalizes pluralistic alignment by leveraging the perspectivist capacities of large language models. Our approach introduces a mechanism where LLMs simulate a diverse set of evaluators, each with distinct personas constructed by matching real human annotators to movie characters based on both demographic profiles and annotation behaviors. These role-played characters independently assess subjective tasks, offering a scalable and human-aligned alternative to traditional evaluation. Empirical results show that our approach consistently outperforms baseline methods, including LLM as a Judge and as a Personalized Judge, across multiple LLMs, showing high and consistent agreement with human ground truth. CINEMETRIC improves accuracy by up to 20% and reduces mean absolute error in toxicity prediction, demonstrating its effectiveness in capturing human-like perspectives.
Anthology ID:
2025.nlperspectives-1.2
Volume:
Proceedings of the The 4th Workshop on Perspectivist Approaches to NLP
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Gavin Abercrombie, Valerio Basile, Simona Frenda, Sara Tonelli, Shiran Dudy
Venues:
NLPerspectives | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
15–26
Language:
URL:
https://preview.aclanthology.org/ingest-emnlp/2025.nlperspectives-1.2/
DOI:
Bibkey:
Cite (ACL):
Vahid Sadiri Javadi, Zain Ul Abedin, and Lucie Flek. 2025. CINEMETRIC: A Framework for Multi-Perspective Evaluation of Conversational Agents using Human-AI Collaboration. In Proceedings of the The 4th Workshop on Perspectivist Approaches to NLP, pages 15–26, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
CINEMETRIC: A Framework for Multi-Perspective Evaluation of Conversational Agents using Human-AI Collaboration (Javadi et al., NLPerspectives 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-emnlp/2025.nlperspectives-1.2.pdf