EMPATH: An Ensemble Method for Automatic Fine-Grained Turn-Level Dialogue Empathy Evaluation with a Novel Emotional Distance Metric

Dongning Rao; Zhihua Liang; Zhihua Jiang

EMPATH: An Ensemble Method for Automatic Fine-Grained Turn-Level Dialogue Empathy Evaluation with a Novel Emotional Distance Metric

Dongning Rao, Zhihua Liang, Zhihua Jiang

Abstract

Empathy is key to many professions. In recognition of this, the workshops on computational approaches to subjectivity, sentiment, and social media analysis (WASSA) hosted competitions to evaluate empathy in dialogue. While fine-tuning has proved successful in the competition, there are at least three shortcomings. First, novel metrics for empathy are absent. Second, classical dialogue evaluation metrics require further investigation. Third, the ensemble’s potential remained underdeveloped. To address these issues, we propose the EMPATH framework, which combines fine-tuned models, large language models, classical dialogue evaluation metrics, and a novel metric. The novel metric, ED, encourages the response’s emotional tone to be contextually appropriate. E.g., if the user expresses joy, a cheerful reaction should receive a higher ranking. Furthermore, we introduce a new robust and label-free ensemble strategy, HO, which integrates sub-metrics with the lowest correlation coefficient first. In addition to evaluating on the WASSA benchmark, we test EMPATH’s generalizability using the EmpatheticExchanges dataset (EX). Our experiment results demonstrate that EMPATH yields the best results on the competition dataset, and ablation studies validate our component selection. On EX, the Pearson correlation coefficient for the winner of WASSA 2024 is 0.4066, while EMPATH shows a statistically significant 8% improvement (i.e., 0.4860).

Anthology ID:: 2026.findings-acl.1790
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 35921–35942
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1790/
DOI:
Bibkey:
Cite (ACL):: Dongning Rao, Zhihua Liang, and Zhihua Jiang. 2026. EMPATH: An Ensemble Method for Automatic Fine-Grained Turn-Level Dialogue Empathy Evaluation with a Novel Emotional Distance Metric. In Findings of the Association for Computational Linguistics: ACL 2026, pages 35921–35942, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: EMPATH: An Ensemble Method for Automatic Fine-Grained Turn-Level Dialogue Empathy Evaluation with a Novel Emotional Distance Metric (Rao et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1790.pdf
Checklist:: 2026.findings-acl.1790.checklist.pdf

PDF Cite Search Checklist Fix data