Comprehensive Benchmarking of Long-Form Speech Generation in Diverse Scenarios

Changhao Pan; Rui Yang; Han Wang; Zhuan Zhou; Xuming He; Wenxiang Guo; Ziyue Jiang; Ruiqi Li; Yu Zhang; Chenyuhao Wen; Ke Lei; Xiang Yin; Jingyu Lu; Zhiyuan Zhu; Zhou Zhao

Comprehensive Benchmarking of Long-Form Speech Generation in Diverse Scenarios

Changhao Pan, Rui Yang, Han Wang, Zhuan Zhou, Xuming He, Wenxiang Guo, Ziyue Jiang, Ruiqi Li, Yu Zhang, Chenyuhao Wen, Ke Lei, Xiang Yin, Jingyu Lu, Zhiyuan Zhu, Zhou Zhao

Abstract

Recent advances in speech generation have enabled high-fidelity synthesis, yet systematic evaluation of models under long-context conditions remains largely underexplored. A comprehensive evaluation benchmark for long-form speech is indispensable for two reasons: 1) existing test scenarios are often confined to limited domains, creating a significant gap with the diverse downstream applications; 2) existing metrics overlook critical long-text factors such as consistency and coherence, failing to generalize reliably. To this end, we propose LFSBench, a comprehensive benchmark that decomposes “long-form speech quality” into specific, disentangled dimensions. LFSBench has three key properties. 1) Rich speech scenarios: Focusing on long-form speech generation and multi-speaker dialog generation, LFSBench covers acoustics, semantics, and expressiveness challenges, and consists of 1,101 samples spanning 17 common speech scenarios; 2) Comprehensive evaluation dimensions: Along the acoustics, semantics, and expressiveness axes, LFSBench defines an automated evaluation protocol with seven metrics to provide a comprehensive, accurate, and standardized assessment; 3) Valuable Insights: Through extensive experiments, we reveal that current models still struggle in highly expressive scenarios and exhibit a notable gap in consistency and hierarchy compared to real recordings.

Anthology ID:: 2026.findings-acl.112
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2365–2400
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.112/
DOI:
Bibkey:
Cite (ACL):: Changhao Pan, Rui Yang, Han Wang, Zhuan Zhou, Xuming He, Wenxiang Guo, Ziyue Jiang, Ruiqi Li, Yu Zhang, Chenyuhao Wen, Ke Lei, Xiang Yin, Jingyu Lu, Zhiyuan Zhu, and Zhou Zhao. 2026. Comprehensive Benchmarking of Long-Form Speech Generation in Diverse Scenarios. In Findings of the Association for Computational Linguistics: ACL 2026, pages 2365–2400, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Comprehensive Benchmarking of Long-Form Speech Generation in Diverse Scenarios (Pan et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.112.pdf
Checklist:: 2026.findings-acl.112.checklist.pdf

PDF Cite Search Checklist Fix data