Concord: An Agreement-Aware Multi-Adjudication Pipeline for LLM Evaluation

Tyler Bliss, Mahit Verma, Aila Iyer-Singh, Subrata Biswas, Sheikh Asif Imran, Bashima Islam


Abstract
Evaluating multimodal generations is challenging: human evaluation is costly, and single-model LLM-as-a-judge pipelines can be brittle and provide limited uncertainty signals. We introduce Concord, an ensemble-based evaluation pipeline that aggregates discrete judgments from multiple LLM judges and uses inter-judge agreement as a practical uncertainty signal for disagreement-driven triage. We evaluate Concord on AVSSD and SCORE-AVS, a ground-truth-supervised audio-visual benchmark with discrete labels (True/False or 0–5). Concord improves agreement with human judgments over single-judge and naive aggregation baselines, and prioritizing low-agreement instances focuses human review on the most ambiguous cases. We use locally hosted open-source judges and include the binary results for online larger scale models GPT4.o mini turbo and Gemini 3.1 Flash Lite.
Anthology ID:
2026.gem-main.46
Volume:
Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Simon Mille, Sebastian Gehrmann, Patrícia Schmidtová, Ondřej Dušek, Marzieh Fadaee, Kyle Lo, Enrico Santus, Gabriel Stanovsky
Venues:
GEM | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
502–510
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.gem-main.46/
DOI:
Bibkey:
Cite (ACL):
Tyler Bliss, Mahit Verma, Aila Iyer-Singh, Subrata Biswas, Sheikh Asif Imran, and Bashima Islam. 2026. Concord: An Agreement-Aware Multi-Adjudication Pipeline for LLM Evaluation. In Proceedings of the Fifth Workshop on Generation, Evaluation and Metrics (GEM), pages 502–510, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
Concord: An Agreement-Aware Multi-Adjudication Pipeline for LLM Evaluation (Bliss et al., GEM 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.gem-main.46.pdf