Michinori Jinji
2026
MAPLE: Multi-Aspect Panels of LLM Evaluators for Open-Ended Questions
Michinori Jinji | Kyohei Atarashi | Koh Takeuchi | Hisashi Kashima
Findings of the Association for Computational Linguistics: ACL 2026
Michinori Jinji | Kyohei Atarashi | Koh Takeuchi | Hisashi Kashima
Findings of the Association for Computational Linguistics: ACL 2026
LLM-as-a-Judge, which uses LLMs to evaluate responses to open-ended questions, has seen significant growth in recent years. It has been adopted as a scalable alternative to manual human evaluation, such as crowdsourcing, which is often time-consuming and costly. However, the discrepancy between LLM-generated evaluations and human evaluations remains a critical problem in this field. To bridge this gap, we propose Multi-Aspect Panels of LLM Evaluators (MAPLE), a framework that orchestrates evaluations across multiple criteria using multiple LLMs. MAPLE integrates criterion-wise pairwise evaluations from multiple LLMs by estimating the importance of criteria and the reliability of individual evaluators. We conduct experiments with both open-source and closed-source models. Our results demonstrate that MAPLE achieves superior alignment with human evaluations compared to baselines, highlighting the importance of employing multi-agent and multi-criteria evaluation strategies.