ALERT: An LLM-powered Benchmark for Automatic Evaluation of Recommendation Explanations

Yichuan Li; Xinyang Zhang; Chenwei Zhang; Mao Li; Tianyi Liu; Pei Chen; Yifan Gao; Kyumin Lee; Kaize Ding; Zhengyang Wang; Zhihan Zhang; Jingbo Shang; Xian Li; Trishul Chilimbi

ALERT: An LLM-powered Benchmark for Automatic Evaluation of Recommendation Explanations

Yichuan Li, Xinyang Zhang, Chenwei Zhang, Mao Li, Tianyi Liu, Pei Chen, Yifan Gao, Kyumin Lee, Kaize Ding, Zhengyang Wang, Zhihan Zhang, Jingbo Shang, Xian Li, Trishul Chilimbi

Abstract

Recommendation explanation systems have become increasingly vital with the widespread adoption of recommender systems. However, existing recommendation explanation evaluation benchmarks suffer from limited item diversity, impractical user profiling requirements, and unreliable and unscalable evaluation protocols. We present ALERT, a model-agnostic recommendation explanation evaluation benchmark. The benchmark comprises three main contributions: 1) a diverse dataset encompassing 15 Amazon e-commerce categories with 2,761 user-item interactions, incorporating implicit preferences through purchase histories;2) two novel LLM-powered automatic evaluators that enable scalable and human-preference aligned evaluation of explanations; and 3) a robust divide-and-aggregate approach that synthesizes multiple LLM judgments, achieving 70% concordance with expert human evaluation and substantially outperforming existing methods.ALERT facilitates comprehensive evaluation of recommendation explanations across diverse domains, advancing the development of more effective explanation systems.

Anthology ID:: 2025.naacl-long.137
Volume:: Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Month:: April
Year:: 2025
Address:: Albuquerque, New Mexico
Editors:: Luis Chiruzzo, Alan Ritter, Lu Wang
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2704–2719
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2025.naacl-long.137/
DOI:
Bibkey:
Cite (ACL):: Yichuan Li, Xinyang Zhang, Chenwei Zhang, Mao Li, Tianyi Liu, Pei Chen, Yifan Gao, Kyumin Lee, Kaize Ding, Zhengyang Wang, Zhihan Zhang, Jingbo Shang, Xian Li, and Trishul Chilimbi. 2025. ALERT: An LLM-powered Benchmark for Automatic Evaluation of Recommendation Explanations. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 2704–2719, Albuquerque, New Mexico. Association for Computational Linguistics.
Cite (Informal):: ALERT: An LLM-powered Benchmark for Automatic Evaluation of Recommendation Explanations (Li et al., NAACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2025.naacl-long.137.pdf

PDF Cite Search Fix data