Towards Privacy-Preserving Fine-Tuning: Anonymization of Aphasic Speech for Effective ASR

Sebastian Hofstetter, Timo Baumann


Abstract
The scarcity of publicly available aphasic speech data, driven largely by privacy concerns, poses a significant barrier for fine-tuning Automatic Speech Recognition (ASR) systems in this domain. This study investigates the privacy–utility trade-off of speech anonymization as a strategy to increase data availability. A signal-based McAdams anonymization method is applied to a subset of the AphasiaBank corpus comprising approximately 132 hours of speech from 425 individuals. Privacy is evaluated using an ECAPA-TDNN based Automatic Speaker Verification system and the Equal Error Rate metric. Linguistic utility is assessed by the Word Error Rate using wav2vec2.0 ASR model, tested in multiple conditions, both pretrained and fine-tuned on unprotected and anonymized audio. Our results show that fine-tuning on anonymized aphasic speech data improves ASR performance by +18 % compared to the performance of generic models on non-anonymized speech. Crucially, this gain in utility is achieved alongside substantial privacy protection, with anonymization increasing the privacy by +440 % compared to sharing unprotected speech. This work thus provides a proof-of-concept, demonstrating that speech anonymization mitigates privacy risks to tackle data scarcity and support the development of more effective ASR systems for people with aphasia.
Anthology ID:
2026.lrec-main.446
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
5666–5676
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.446/
DOI:
Bibkey:
Cite (ACL):
Sebastian Hofstetter and Timo Baumann. 2026. Towards Privacy-Preserving Fine-Tuning: Anonymization of Aphasic Speech for Effective ASR. International Conference on Language Resources and Evaluation, main:5666–5676.
Cite (Informal):
Towards Privacy-Preserving Fine-Tuning: Anonymization of Aphasic Speech for Effective ASR (Hofstetter & Baumann, LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.446.pdf