H. Schwartz

2025

pdf bib abs
Idiosyncratic Versus Normative Modeling of Atypical Speech Recognition: Dysarthric Case Studies
Vishnu Raja | Adithya V Ganesan | Anand Syamkumar | Ritwik Banerjee | H. Schwartz
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

State-of-the-art automatic speech recognition (ASR) models like Whisper perform poorly on atypical speech, such as that produced by individuals with dysarthria. Past works for atypical speech have mostly investigated fully personalized (or idiosyncratic) models, but modeling strategies that can both generalize and handle idiosyncrasy could be more effective for capturing atypical speech. To investigate this, we compare four strategies: (a) *normative* models trained on typical speech (no personalization), (b) *idiosyncratic* models completely personalized to individuals, (c) *dysarthric-normative* models trained on other dysarthric speakers, and (d) *dysarthric-idiosyncratic* models which combine strategies by first modeling normative patterns before adapting to individual speech. In this case study, we find the dysarthric-idiosyncratic model performs better than the idiosyncratic approach while requiring less than half as much personalized data (36.43 WER with 128 train size vs. 36.99 with 256). Further, we found that tuning the speech encoder alone (as opposed to the LM decoder) yielded the best results, reducing word error rate from 71% to 32% on average. Our findings highlight the value of leveraging both normative (cross-speaker) and idiosyncratic (speaker-specific) patterns to improve ASR for underrepresented speech populations. [GitHub: VishnuRaja98/Dysarthric-Speech-Transcription](https://github.com/VishnuRaja98/Dysarthric-Speech-Transcription)

Responsible use of Authorship Verification (AV) systems not only requires high accuracy but also interpretable solutions. More importantly, for systems to be used to make decisions with real-world consequences requires the model’s prediction to be explainable using interpretable features that can be traced to the original texts. Neural methods achieve high accuracies, but their representations lack direct interpretability. Furthermore, LLM predictions cannot be explained faithfully – if there is an explanation given for a prediction, it doesn’t represent the reasoning process behind the model’s prediction. In this paper, we introduce Residualized Similarity (RS), a novel method that supplements systems using interpretable features with a neural network to improve their performance while maintaining interpretability. Authorship verification is fundamentally a similarity task, where the goal is to measure how alike two documents are. The key idea is to use the neural network to predict a similarity residual, i.e. the error in the similarity predicted by the interpretable system. Our evaluation across four datasets shows that not only can we match the performance of state-of-the-art authorship verification models, but we can show how and to what degree the final prediction is faithful and interpretable.

Co-authors

Venues

emnlp1
findings1

Fix author