SVeritas: Benchmark for Robust Speaker Verification under Diverse Conditions

Massa Baali; Sarthak Bisht; Francisco Teixeira; Kateryna Shapovalenko; Rita Singh; Bhiksha Raj

doi:10.18653/v1/2025.findings-emnlp.516

SVeritas: Benchmark for Robust Speaker Verification under Diverse Conditions

Massa Baali, Sarthak Bisht, Francisco Teixeira, Kateryna Shapovalenko, Rita Singh, Bhiksha Raj

Abstract

Speaker verification (SV) models are increasingly integrated into security, personalization, and access control systems, yet their robustness to many real-world challenges remains inadequately benchmarked. Real-world systems can face diverse conditions, some naturally occurring, and others that may be purposely, or even maliciously created, which introduce mismatches between enrollment and test data, affecting their performance. Ideally, the effect of all of these on model performance must be benchmarked; however existing benchmarks fall short, generally evaluating only a subset of potential conditions, and missing others entirely. We introduce SVeritas, the Speaker Verification tasks benchmark suite, which evaluates the performance of speaker verification systems under an extensive variety of stressors, including “natural” variations such as duration, spontaneity and content of the recordings, background conditions such as noise, microphone distance, reverberation, and channel mismatches, recording condition influences such as audio bandwidth and the effect of various codecs, physical influences, such as the age and health conditions of the speaker, as well as the suspectibility of the models to spoofing and adversarial attacks. While several benchmarks do exist that each cover some of these issues, SVeritas is the first comprehensive evaluation that not only includes all of these, but also several other entirely new, but nonetheless important real-life conditions that have not previously been benchmarked. We use SVeritas to evaluate several state-of-the-art SV models and observe that while some architectures maintain stability under common distortions, they suffer substantial performance degradation in scenarios involving cross-language trials, age mismatches, and codec-induced compression. Extending our analysis across demographic subgroups, we further identify disparities in robustness across age groups, gender, and linguistic backgrounds. By standardizing evaluation under realistic and synthetic stress conditions, SVeritas enables precise diagnosis of model weaknesses and establishes a foundation for advancing equitable and reliable speaker verification systems.

Anthology ID:: 2025.findings-emnlp.516
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 9714–9731
Language:
URL:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.516/
DOI:: 10.18653/v1/2025.findings-emnlp.516
Bibkey:
Cite (ACL):: Massa Baali, Sarthak Bisht, Francisco Teixeira, Kateryna Shapovalenko, Rita Singh, and Bhiksha Raj. 2025. SVeritas: Benchmark for Robust Speaker Verification under Diverse Conditions. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 9714–9731, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: SVeritas: Benchmark for Robust Speaker Verification under Diverse Conditions (Baali et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.516.pdf
Checklist:: 2025.findings-emnlp.516.checklist.pdf

PDF Cite Search Checklist Fix data