Voice synthesis in Polish and English - analyzing prediction differences in speaker verification systems

Joanna Gajewska; Alicja Martinek; Michał J. Ołowski; Ewelina Bartuzi-Trokielewicz

Voice synthesis in Polish and English - analyzing prediction differences in speaker verification systems

Joanna Gajewska, Alicja Martinek, Michał J. Ołowski, Ewelina Bartuzi-Trokielewicz

Abstract

Deep learning has significantly enhanced voice synthesis, yielding realistic audio capable of mimicking individual voices. This progress, however, raises security concerns due to the potential misuse of audio deepfakes. Our research examines the effects of deepfakes on speaker recognition systems across English and Polish corpora, assessing both Text-to-Speech and Voice Conversion methods. We focus on the biometric similarity’s role in the effectiveness of impersonations and find that synthetic voices can maintain personal traits, posing risks of unauthorized access. The study’s key contributions include analyzing voice synthesis across languages, evaluating biometric resemblance in voice conversion, and contrasting Text-to-Speech and Voice Conversion paradigms. These insights emphasize the need for improved biometric security against audio deepfake threats.

Anthology ID:: 2025.coling-main.643
Volume:: Proceedings of the 31st International Conference on Computational Linguistics
Month:: January
Year:: 2025
Address:: Abu Dhabi, UAE
Editors:: Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
Venue:: COLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 9618–9629
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2025.coling-main.643/
DOI:
Bibkey:
Cite (ACL):: Joanna Gajewska, Alicja Martinek, Michał J. Ołowski, and Ewelina Bartuzi-Trokielewicz. 2025. Voice synthesis in Polish and English - analyzing prediction differences in speaker verification systems. In Proceedings of the 31st International Conference on Computational Linguistics, pages 9618–9629, Abu Dhabi, UAE. Association for Computational Linguistics.
Cite (Informal):: Voice synthesis in Polish and English - analyzing prediction differences in speaker verification systems (Gajewska et al., COLING 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2025.coling-main.643.pdf

PDF Cite Search Fix data