A Fine-tuned ASR Model for Historical American Dialect Recordings

Steven Coats

A Fine-tuned ASR Model for Historical American Dialect Recordings

Abstract

This paper introduces DASS2019_NLP, a newly cleaned and curated version of the Digital Archive of Southern Speech, a major historical resource for the study of Southern American English, together with six Whisper ASR models fine-tuned on the data. The 344 hours of conversational speech were recorded by fieldworkers between 1969 and 1983 across the Southern United States. Each Whisper model was fine-tuned on DASS2019_NLP, then evaluated on held-out DASS2019_NLP data, a subset of the Corpus of Regional African American Language (CORAAL), and a subset of Common Voice. The fine-tuned models show consistent learning trajectories and achieve an average 37% reduction in WER on in-domain data relative to baseline models. Notably, they also improve transcription accuracy on CORAAL, suggesting enhanced robustness to African American English. As expected under read vs. conversational style mismatch, accuracy on CV generally favors the OpenAI baselines. Both the DASS2019_NLP dataset and the best-performing fine-tuned model (whisper-large-v3-DASS-ct2) have been publicly released. These resources provide new tools for quantitative research in historical sociolinguistics, facilitating large-scale analyses of phonological, lexical, and grammatical change in Southern and African American English.

Anthology ID:: 2026.lrec-main.107
Volume:: Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:: May
Year:: 2026
Address:: Palma de Mallorca, Spain
Editors:: Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:: LREC
SIG:
Publisher:: ELRA Language Resource Association
Note:
Pages:: 1372–1381
Language:
URL:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.107/
DOI:
Bibkey:
Cite (ACL):: Steven Coats. 2026. A Fine-tuned ASR Model for Historical American Dialect Recordings. International Conference on Language Resources and Evaluation, main:1372–1381.
Cite (Informal):: A Fine-tuned ASR Model for Historical American Dialect Recordings (Coats, LREC 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.107.pdf

PDF Cite Search Fix data