N-CORE: N-View Consistency Regularization for Disentangled Representation Learning in Nonverbal Vocalizations

Siddhant Bikram Shah, Kristina T. Johnson


Abstract
Nonverbal vocalizations are an essential component of human communication, conveying rich information without linguistic content. However, their computational analysis is hindered by a lack of lexical anchors in the data, compounded by biased and imbalanced data distributions. While disentangled representation learning has shown promise in isolating specific speech features, its application to nonverbal vocalizations remains unexplored. In this paper, we introduce N-CORE, a novel backbone-agnostic framework designed to disentangle intertwined features like emotion and speaker information from nonverbal vocalizations by leveraging N views of audio samples to learn invariance to specific transformations. N-CORE achieves competitive performance compared to state-of-the-art methods for emotion and speaker classification on the VIVAE, ReCANVo, and ReCANVo-Balanced datasets. We further propose an emotion perturbation function that disrupts affective information while preserving speaker information in audio signals for emotion-invariant speaker classification. Our work informs research directions on paralinguistic speech processing, including clinical diagnoses of atypical speech and longitudinal analysis of communicative development. Our code is available at https://github.com/SiddhantBikram/N-CORE.
Anthology ID:
2025.emnlp-main.1693
Volume:
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
33362–33379
Language:
URL:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.emnlp-main.1693/
DOI:
10.18653/v1/2025.emnlp-main.1693
Bibkey:
Cite (ACL):
Siddhant Bikram Shah and Kristina T. Johnson. 2025. N-CORE: N-View Consistency Regularization for Disentangled Representation Learning in Nonverbal Vocalizations. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 33362–33379, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
N-CORE: N-View Consistency Regularization for Disentangled Representation Learning in Nonverbal Vocalizations (Shah & Johnson, EMNLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.emnlp-main.1693.pdf
Checklist:
 2025.emnlp-main.1693.checklist.pdf