Hearing Between the Lines: Unlocking the Reasoning Power of LLMs for Speech Evaluation
Arjun Chandra, Kevin Miller, Venkatesh Ravichandran, Constantinos Papayiannis, Venkatesh Saligrama
Abstract
Large Language Model (LLM) judges exhibit strong reasoning capabilities but are limited to textual content. This leaves current automatic Speech-to-Speech (S2S) evaluation methods reliant on opaque and expensive Audio Language Models (ALMs). In this work, we propose TRACE (Textual Reasoning over Audio Cues for Evaluation), a novel framework that enables LLM judges to reason over audio cues to achieve cost-efficient and human-aligned S2S evaluation. To demonstrate the strength of the framework, we first introduce a Human Chain-of-Thought (HCoT) annotation protocol to improve the diagnostic capability of existing judge benchmarks by separating evaluation into explicit dimensions: content (C), voice quality (VQ), and paralinguistics (P). Using this data, TRACE constructs a textual blueprint of inexpensive audio signals and prompts an LLM to render dimension-wise judgments, fusing them into an overall rating via a deterministic policy. TRACE achieves higher agreement with human raters than ALMs and transcript-only LLM judges while being significantly more cost-effective. We will release the HCoT annotations and the TRACE framework to enable scalable and human-aligned S2S evaluation.- Anthology ID:
- 2026.findings-eacl.151
- Volume:
- Findings of the Association for Computational Linguistics: EACL 2026
- Month:
- March
- Year:
- 2026
- Address:
- Rabat, Morocco
- Editors:
- Vera Demberg, Kentaro Inui, Lluís Marquez
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2895–2916
- Language:
- URL:
- https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.151/
- DOI:
- Cite (ACL):
- Arjun Chandra, Kevin Miller, Venkatesh Ravichandran, Constantinos Papayiannis, and Venkatesh Saligrama. 2026. Hearing Between the Lines: Unlocking the Reasoning Power of LLMs for Speech Evaluation. In Findings of the Association for Computational Linguistics: EACL 2026, pages 2895–2916, Rabat, Morocco. Association for Computational Linguistics.
- Cite (Informal):
- Hearing Between the Lines: Unlocking the Reasoning Power of LLMs for Speech Evaluation (Chandra et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.151.pdf