Preset-Voice Matching for Privacy Regulated Speech-to-Speech Translation Systems

Daniel Platnick; Bishoy Abdelnour; Eamon Earl; Rahul Kumar; Zahra Rezaei; Thomas Tsangaris; Faraj Lagum

Preset-Voice Matching for Privacy Regulated Speech-to-Speech Translation Systems

Daniel Platnick, Bishoy Abdelnour, Eamon Earl, Rahul Kumar, Zahra Rezaei, Thomas Tsangaris, Faraj Lagum

Abstract

In recent years, there has been increased demand for speech-to-speech translation (S2ST) systems in industry settings. Although successfully commercialized, cloning-based S2ST systems expose their distributors to liabilities when misused by individuals and can infringe on personality rights when exploited by media organizations. This work proposes a regulated S2ST framework called Preset-Voice Matching (PVM). PVM removes cross-lingual voice cloning in S2ST by first matching the input voice to a similar prior consenting speaker voice in the target-language. With this separation, PVM avoids cloning the input speaker, ensuring PVM systems comply with regulations and reduce risk of misuse. Our results demonstrate PVM can significantly improve S2ST system run-time in multi-speaker settings and the naturalness of S2ST synthesized speech. To our knowledge, PVM is the first explicitly regulated S2ST framework leveraging similarly-matched preset-voices for dynamic S2ST tasks.

Anthology ID:: 2024.privatenlp-1.6
Volume:: Proceedings of the Fifth Workshop on Privacy in Natural Language Processing
Month:: August
Year:: 2024
Address:: Bangkok, Thailand
Editors:: Ivan Habernal, Sepideh Ghanavati, Abhilasha Ravichander, Vijayanta Jain, Patricia Thaine, Timour Igamberdiev, Niloofar Mireshghallah, Oluwaseyi Feyisetan
Venues:: PrivateNLP | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 52–62
Language:
URL:: https://aclanthology.org/2024.privatenlp-1.6
DOI:
Bibkey:
Cite (ACL):: Daniel Platnick, Bishoy Abdelnour, Eamon Earl, Rahul Kumar, Zahra Rezaei, Thomas Tsangaris, and Faraj Lagum. 2024. Preset-Voice Matching for Privacy Regulated Speech-to-Speech Translation Systems. In Proceedings of the Fifth Workshop on Privacy in Natural Language Processing, pages 52–62, Bangkok, Thailand. Association for Computational Linguistics.
Cite (Informal):: Preset-Voice Matching for Privacy Regulated Speech-to-Speech Translation Systems (Platnick et al., PrivateNLP-WS 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/nschneid-patch-5/2024.privatenlp-1.6.pdf

PDF Search