SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities

Hsiang-Sheng Tsai; Heng-Jui Chang; Wen-Chin Huang; Zili Huang; Kushal Lakhotia; Shu-wen Yang; Shuyan Dong; Andy Liu; Cheng-I Lai; Jiatong Shi; Xuankai Chang; Phil Hall; Hsuan-Jui Chen; Shang-Wen Li; Shinji Watanabe; Abdelrahman Mohamed; Hung-Yi Lee

doi:10.18653/v1/2022.acl-long.580

SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities

Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, Zili Huang, Kushal Lakhotia, Shu-wen Yang, Shuyan Dong, Andy Liu, Cheng-I Lai, Jiatong Shi, Xuankai Chang, Phil Hall, Hsuan-Jui Chen, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee

Abstract

Transfer learning has proven to be crucial in advancing the state of speech and natural language processing research in recent years. In speech, a model pre-trained by self-supervised learning transfers remarkably well on multiple tasks. However, the lack of a consistent evaluation methodology is limiting towards a holistic understanding of the efficacy of such models. SUPERB was a step towards introducing a common benchmark to evaluate pre-trained models across various speech tasks. In this paper, we introduce SUPERB-SG, a new benchmark focusing on evaluating the semantic and generative capabilities of pre-trained models by increasing task diversity and difficulty over SUPERB. We use a lightweight methodology to test the robustness of representations learned by pre-trained models under shifts in data domain and quality across different types of tasks. It entails freezing pre-trained model parameters, only using simple task-specific trainable heads. The goal is to be inclusive of all researchers, and encourage efficient use of computational resources. We also show that the task diversity of SUPERB-SG coupled with limited task supervision is an effective recipe for evaluating the generalizability of model representation.

Anthology ID:: 2022.acl-long.580
Volume:: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: May
Year:: 2022
Address:: Dublin, Ireland
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 8479–8492
Language:
URL:: https://aclanthology.org/2022.acl-long.580
DOI:: 10.18653/v1/2022.acl-long.580
Bibkey:
Cite (ACL):: Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, Zili Huang, Kushal Lakhotia, Shu-wen Yang, Shuyan Dong, Andy Liu, Cheng-I Lai, Jiatong Shi, Xuankai Chang, Phil Hall, Hsuan-Jui Chen, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, and Hung-yi Lee. 2022. SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8479–8492, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):: SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities (Tsai et al., ACL 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/remove-xml-comments/2022.acl-long.580.pdf
Video:: https://preview.aclanthology.org/remove-xml-comments/2022.acl-long.580.mp4
Code: s3prl/s3prl
Data: CoVoST, Common Voice, LibriMix, VoiceBank + DEMAND

PDF Search Code Video