Self-Supervised Contrastive Learning for Content-Centric Speech Representation
Lijinlong Lijinlong, Ling Dong, Wenjun Wang, Zhengtao Yu, Shengxiang Gao
Abstract
"Self-supervised learning (SSL) speech models have achieved remarkable performance across various tasks, with the learned representations often exhibiting a high degree of generality and applicability to multiple downstream tasks. However, these representations contain both speech content and some paralinguistic information, which may be redundant for content-focused tasks.Decoupling this redundant information is challenging. To address this issue, we propose a Self-Supervised Contrastive Representation Learning method (SSCRL), which effectively disentangles paralinguistic information from speech content by aligning similar content speech representations in the feature space using self-supervised contrastive learning with pitch perturbation and speaker perturbation features. Experimental results demonstrate that the proposed method, when fine-tuned on the LibriSpeech 100-hour dataset, achieves superior performance across all content-related tasks in the SUPERB Benchmark, generally outperforming prior approaches."- Anthology ID:
- 2025.ccl-1.61
- Volume:
- Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025)
- Month:
- August
- Year:
- 2025
- Address:
- Jinan, China
- Editors:
- Maosong Sun, Peiyong Duan, Zhiyuan Liu, Ruifeng Xu, Weiwei Sun
- Venue:
- CCL
- SIG:
- Publisher:
- Chinese Information Processing Society of China
- Note:
- Pages:
- 807–817
- Language:
- URL:
- https://preview.aclanthology.org/ingest-ccl/2025.ccl-1.61/
- DOI:
- Cite (ACL):
- Lijinlong Lijinlong, Ling Dong, Wenjun Wang, Zhengtao Yu, and Shengxiang Gao. 2025. Self-Supervised Contrastive Learning for Content-Centric Speech Representation. In Proceedings of the 24th China National Conference on Computational Linguistics (CCL 2025), pages 807–817, Jinan, China. Chinese Information Processing Society of China.
- Cite (Informal):
- Self-Supervised Contrastive Learning for Content-Centric Speech Representation (Lijinlong et al., CCL 2025)
- PDF:
- https://preview.aclanthology.org/ingest-ccl/2025.ccl-1.61.pdf