Self-Supervised Contrastive Learning for Content-Centric Speech Representation

Lijinlong Lijinlong, Ling Dong, Wenjun Wang, Zhengtao Yu, Shengxiang Gao


Abstract
"Self-supervised learning (SSL) speech models have achieved remarkable performance across various tasks, with the learned representations often exhibiting a high degree of generality and applicability to multiple downstream tasks. However, these representations contain both speech content and some paralinguistic information, which may be redundant for content-focused tasks.Decoupling this redundant information is challenging. To address this issue, we propose a Self-Supervised Contrastive Representation Learning method (SSCRL), which effectively disentangles paralinguistic information from speech content by aligning similar content speech representations in the feature space using self-supervised contrastive learning with pitch perturbation and speaker perturbation features. Experimental results demonstrate that the proposed method, when fine-tuned on the LibriSpeech 100-hour dataset, achieves superior performance across all content-related tasks in the SUPERB Benchmark, generally outperforming prior approaches."
Anthology ID:
2025.ccl-1.61
Volume:
Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
Month:
August
Year:
2025
Address:
Jinan, China
Editors:
Maosong Sun, Peiyong Duan, Zhiyuan Liu, Ruifeng Xu, Weiwei Sun
Venue:
CCL
SIG:
Publisher:
Chinese Information Processing Society of China
Note:
Pages:
807–817
Language:
URL:
https://preview.aclanthology.org/ingest-ccl/2025.ccl-1.61/
DOI:
Bibkey:
Cite (ACL):
Lijinlong Lijinlong, Ling Dong, Wenjun Wang, Zhengtao Yu, and Shengxiang Gao. 2025. Self-Supervised Contrastive Learning for Content-Centric Speech Representation. In Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025), pages 807–817, Jinan, China. Chinese Information Processing Society of China.
Cite (Informal):
Self-Supervised Contrastive Learning for Content-Centric Speech Representation (Lijinlong et al., CCL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-ccl/2025.ccl-1.61.pdf