EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in Speech-to-Speech Models
Maureen de Seyssel, Antony D’Avirro, Adina Williams, Emmanuel Dupoux
Abstract
We introduce EmphAssess, a prosodic benchmark designed to evaluate the capability of speech-to-speech models to encode and reproduce prosodic emphasis. We apply this to two tasks: speech resynthesis and speech-to-speech translation. In both cases, the benchmark evaluates the ability of the model to encode emphasis in the speech input and accurately reproduce it in the output, potentially across a change of speaker and language. As part of the evaluation pipeline, we introduce EmphaClass, a new model that classifies emphasis at the frame or word level.- Anthology ID:
- 2024.emnlp-main.30
- Volume:
- Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
- Month:
- November
- Year:
- 2024
- Address:
- Miami, Florida, USA
- Editors:
- Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 495–507
- Language:
- URL:
- https://preview.aclanthology.org/fix-sig-urls/2024.emnlp-main.30/
- DOI:
- 10.18653/v1/2024.emnlp-main.30
- Cite (ACL):
- Maureen de Seyssel, Antony D’Avirro, Adina Williams, and Emmanuel Dupoux. 2024. EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in Speech-to-Speech Models. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 495–507, Miami, Florida, USA. Association for Computational Linguistics.
- Cite (Informal):
- EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in Speech-to-Speech Models (de Seyssel et al., EMNLP 2024)
- PDF:
- https://preview.aclanthology.org/fix-sig-urls/2024.emnlp-main.30.pdf