From Scores to Preferences: Redefining Evaluation Paradigm for Speech Quality Reward Modeling

Yifei Cao; Changhao Jiang; Jiabao Zhuang; Jiajun Sun; Ming Zhang; Zhiheng Xi; Hui Li; Shihan Dou; Yuran Wang; Yunke Zhang; Tao Ji; Tao Gui; Qi Zhang; Xuan-Jing Huang (黄萱菁)

From Scores to Preferences: Redefining Evaluation Paradigm for Speech Quality Reward Modeling

Yifei Cao, Changhao Jiang, Jiabao Zhuang, Jiajun Sun, Ming Zhang, Zhiheng Xi, Hui Li, Shihan Dou, Yuran Wang, Yunke Zhang, Tao Ji, Tao Gui, Qi Zhang, Xuanjing Huang

Abstract

Speech quality assessment (SQA) is typically formulated as a score regression task based on subjective ratings, such as the Mean Opinion Score (MOS), which inherently suffer from inconsistent standards and limit cross-dataset training and evaluation. To address these limitations, we reformulate SQA as a preference-based comparison paradigm and construct MOS-Pref, a large-scale MOS-derived preference dataset. Building on MOS-Pref, we systematically implement and evaluate three reward modeling paradigms: scalar, semi-scalar, and generative reward models, alongside existing SQA approaches. Our experiments reveal three key findings: (1) scalar models achieve the strongest overall performance, consistently exceeding 74% accuracy; (2) score regression-based approaches generally underperform preference-based methods in both overall performance and generalization; and (3) all reward models struggle on pairs with very small MOS gap. Motivated by these observations, we propose a MOS-aware GRM design that incorporates MOS gap into the reward function during reinforcement learning. Experimental results show that the MOS-aware GRM significantly improves fine-grained speech quality discrimination. We hope this work fosters more rigorous and scalable research in SQA.

Anthology ID:: 2026.findings-acl.1638
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 32731–32749
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1638/
DOI:
Bibkey:
Cite (ACL):: Yifei Cao, Changhao Jiang, Jiabao Zhuang, Jiajun Sun, Ming Zhang, Zhiheng Xi, Hui Li, Shihan Dou, Yuran Wang, Yunke Zhang, Tao Ji, Tao Gui, Qi Zhang, and Xuanjing Huang. 2026. From Scores to Preferences: Redefining Evaluation Paradigm for Speech Quality Reward Modeling. In Findings of the Association for Computational Linguistics: ACL 2026, pages 32731–32749, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: From Scores to Preferences: Redefining Evaluation Paradigm for Speech Quality Reward Modeling (Cao et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1638.pdf
Checklist:: 2026.findings-acl.1638.checklist.pdf

PDF Cite Search Checklist Fix data