Self-supervised Rewiring of Pre-trained Speech Encoders:Towards Faster Fine-tuning with Less Labels in Speech Processing

Hao Yang (杨浩); Jinming Zhao; Gholamreza Haffari; Ehsan Shareghi

doi:10.18653/v1/2022.findings-emnlp.141

Self-supervised Rewiring of Pre-trained Speech Encoders:Towards Faster Fine-tuning with Less Labels in Speech Processing

Hao Yang, Jinming Zhao, Gholamreza Haffari, Ehsan Shareghi

Abstract

Pre-trained speech Transformers have facilitated great success across various speech processing tasks. However, fine-tuning these encoders for downstream tasks require sufficiently large training data to converge or to achieve state-of-the-art. In text domain this has been partly attributed to sub-optimality of the representation space in pre-trained Transformers. In this work, we take a sober look into pre-trained speech encoders and rewire their representation space without requiring any task-specific labels. Our method utilises neutrally synthesised version of audio inputs along with frame masking to construct positive pairs for contrastive self-supervised learning. When used for augmenting the wav2vec 2 encoder, we observe consistent improvement of isotropy in the representation space. Our experiments on 6 speech processing tasks, exhibit a significant convergence speedup during task fine-tuning as well as consistent task improvement, specially in low-resource settings.

Anthology ID:: 2022.findings-emnlp.141
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2022
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates
Editors:: Yoav Goldberg, Zornitsa Kozareva, Yue Zhang
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1952–1959
Language:
URL:: https://preview.aclanthology.org/add-emnlp-2024-awards/2022.findings-emnlp.141/
DOI:: 10.18653/v1/2022.findings-emnlp.141
Bibkey:
Cite (ACL):: Hao Yang, Jinming Zhao, Gholamreza Haffari, and Ehsan Shareghi. 2022. Self-supervised Rewiring of Pre-trained Speech Encoders:Towards Faster Fine-tuning with Less Labels in Speech Processing. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 1952–1959, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Cite (Informal):: Self-supervised Rewiring of Pre-trained Speech Encoders:Towards Faster Fine-tuning with Less Labels in Speech Processing (Yang et al., Findings 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/add-emnlp-2024-awards/2022.findings-emnlp.141.pdf
Video:: https://preview.aclanthology.org/add-emnlp-2024-awards/2022.findings-emnlp.141.mp4

PDF Cite Search Video Fix data