Learning Paralinguistic Features from Audiobooks through Style Voice Conversion

Zakaria Aldeneh, Matthew Perez, Emily Mower Provost


Abstract
Paralinguistics, the non-lexical components of speech, play a crucial role in human-human interaction. Models designed to recognize paralinguistic information, particularly speech emotion and style, are difficult to train because of the limited labeled datasets available. In this work, we present a new framework that enables a neural network to learn to extract paralinguistic attributes from speech using data that are not annotated for emotion. We assess the utility of the learned embeddings on the downstream tasks of emotion recognition and speaking style detection, demonstrating significant improvements over surface acoustic features as well as over embeddings extracted from other unsupervised approaches. Our work enables future systems to leverage the learned embedding extractor as a separate component capable of highlighting the paralinguistic components of speech.
Anthology ID:
2021.naacl-main.377
Volume:
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
June
Year:
2021
Address:
Online
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4736–4745
Language:
URL:
https://aclanthology.org/2021.naacl-main.377
DOI:
10.18653/v1/2021.naacl-main.377
Bibkey:
Cite (ACL):
Zakaria Aldeneh, Matthew Perez, and Emily Mower Provost. 2021. Learning Paralinguistic Features from Audiobooks through Style Voice Conversion. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4736–4745, Online. Association for Computational Linguistics.
Cite (Informal):
Learning Paralinguistic Features from Audiobooks through Style Voice Conversion (Aldeneh et al., NAACL 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/auto-file-uploads/2021.naacl-main.377.pdf
Video:
 https://preview.aclanthology.org/auto-file-uploads/2021.naacl-main.377.mp4
Data
IEMOCAP