Romain Legrand
2024
Emotags: Computer-Assisted Verbal Labelling of Expressive Audiovisual Utterances for Expressive Multimodal TTS
Gérard Bailly
|
Romain Legrand
|
Martin Lenglet
|
Frédéric Elisei
|
Maëva Hueber
|
Olivier Perrotin
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
We developped a web app for ascribing verbal descriptions to expressive audiovisual utterances. These descriptions are limited to lists of adjectives that are either suggested via a navigation in emotional latent spaces built using discriminant analysis of BERT embeddings or entered freely by subjects. We show that such verbal descriptions collected on-line via Prolific on massive data (310 participants, 12620 labelled utterances up-to-now) provide Expressive Multimodal Text-to-Speech Synthesis with precise verbal control over desired emotional content
Search