2025
pdf
bib
abs
VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music
Jiatong Shi
|
Hye-jin Shim
|
Jinchuan Tian
|
Siddhant Arora
|
Haibin Wu
|
Darius Petermann
|
Jia Qi Yip
|
You Zhang
|
Yuxun Tang
|
Wangyou Zhang
|
Dareen Safar Alharthi
|
Yichen Huang
|
Koichi Saito
|
Jionghao Han
|
Yiwen Zhao
|
Chris Donahue
|
Shinji Watanabe
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations)
In this work, we introduce VERSA, a unified and standardized evaluation toolkit designed for various speech, audio, and music signals. The toolkit features a Pythonic interface with flexible configuration and dependency control, making it user-friendly and efficient. With full installation, VERSA offers 65 metrics with 729 metric variations based on different configurations. These metrics encompass evaluations utilizing diverse external resources, including matching and non-matching reference audio, text transcriptions, and text captions. As a lightweight yet comprehensive toolkit, VERSA is versatile to support the evaluation of a wide range of downstream scenarios. To demonstrate its capabilities, this work highlights example use cases for VERSA, including audio coding, speech synthesis, speech enhancement, singing synthesis, and music generation. The toolkit is available at https://github.com/shinjiwlab/versa.
2024
pdf
bib
abs
Codec-SUPERB: An In-Depth Analysis of Sound Codec Models
Haibin Wu
|
Ho-Lam Chung
|
Yi-Cheng Lin
|
Yuan-Kuei Wu
|
Xuanjun Chen
|
Yu-Chi Pai
|
Hsiu-Hsuan Wang
|
Kai-Wei Chang
|
Alexander Liu
|
Hung-yi Lee
Findings of the Association for Computational Linguistics: ACL 2024
The sound codec’s dual roles in minimizing data transmission latency and serving as tokenizers underscore its critical importance.Recent years have witnessed significant developments in codec models.The ideal sound codec should preserve content, paralinguistics, speakers, and audio information.However, the question of which codec achieves optimal sound information preservation remains unanswered, as in different papers, models are evaluated on their selected experimental settings.This study introduces Codec-SUPERB, an acronym for Codec sound processing Universal PERformance Benchmark.It is an ecosystem designed to assess codec models across representative sound applications and signal-level metrics rooted in sound domain knowledge.Codec-SUPERB simplifies result sharing through an online leaderboard, promoting collaboration within a community-driven benchmark database, thereby stimulating new development cycles for codecs.Furthermore, we undertake an in-depth analysis to offer insights into codec models from both application and signal perspectives, diverging from previous codec papers mainly concentrating on signal-level comparisons.Finally, we will release codes, the leaderboard, and data to accelerate progress within the community.