Hearing Between the Lines: Unlocking the Reasoning Power of LLMs for Speech Evaluation

Arjun Chandra, Kevin Miller, Venkatesh Ravichandran, Constantinos Papayiannis, Venkatesh Saligrama


Abstract
Large Language Model (LLM) judges exhibit strong reasoning capabilities but are limited to textual content. This leaves current automatic Speech-to-Speech (S2S) evaluation methods reliant on opaque and expensive Audio Language Models (ALMs). In this work, we propose TRACE (Textual Reasoning over Audio Cues for Evaluation), a novel framework that enables LLM judges to reason over audio cues to achieve cost-efficient and human-aligned S2S evaluation. To demonstrate the strength of the framework, we first introduce a Human Chain-of-Thought (HCoT) annotation protocol to improve the diagnostic capability of existing judge benchmarks by separating evaluation into explicit dimensions: content (C), voice quality (VQ), and paralinguistics (P). Using this data, TRACE constructs a textual blueprint of inexpensive audio signals and prompts an LLM to render dimension-wise judgments, fusing them into an overall rating via a deterministic policy. TRACE achieves higher agreement with human raters than ALMs and transcript-only LLM judges while being significantly more cost-effective. We will release the HCoT annotations and the TRACE framework to enable scalable and human-aligned S2S evaluation.
Anthology ID:
2026.findings-eacl.151
Volume:
Findings of the Association for Computational Linguistics: EACL 2026
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2895–2916
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.151/
DOI:
Bibkey:
Cite (ACL):
Arjun Chandra, Kevin Miller, Venkatesh Ravichandran, Constantinos Papayiannis, and Venkatesh Saligrama. 2026. Hearing Between the Lines: Unlocking the Reasoning Power of LLMs for Speech Evaluation. In Findings of the Association for Computational Linguistics: EACL 2026, pages 2895–2916, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Hearing Between the Lines: Unlocking the Reasoning Power of LLMs for Speech Evaluation (Chandra et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.151.pdf
Checklist:
 2026.findings-eacl.151.checklist.pdf