Customizing ASR for Language Documentation and Resource Prioritization

Alexandra Fort, Shobhana Lakshmi Chelliah


Abstract
Research in language documentation has the potential to benefit from integration of ASR models, especially through the assisted transcription of recordings with audio. Recent advancements in ASR for low-resource languages demonstrate the ability to adapt general, multilingual models for unseen languages with limited fine-tuning data, supporting the creation of custom ASR models. However, resources are still required to collect and prepare the fine-tuning data, necessitating exploration of optimization of resource allocation within the process of data collection and preparation. This paper outlines important considerations for the collection and preparation of data for customizing an ASR model for use in language documentation projects. With the development of a Lamkang ASR model as an example, prioritization of tasks within a language documentation project is outlined by analyzing the relative impact of time spent on transcription correction versus time spent on manual alignment on ASR model performance. Results from this research suggest prioritization of transcription correction over manual-alignment of data and suggest fine-tuning multilingual ASR systems produces superior results to zero-shot ASR models, despite recent advancements in the technology.
Anthology ID:
2026.customnlp4u-1.13
Volume:
Proceedings of the Second Workshop on Customizable NLP: Progress and Challenges in Customizing NLP for a Domain, Application, Group, or Individual (CustomNLP4U)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Sheshera Mysore, Sachin Kumar, Vidhisha Balachandran, Shirley Anugrah Hayati, Faeze Brahman, Hanane Nour Moussa, Alireza Salemi
Venues:
CustomNLP4U | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
149–159
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.customnlp4u-1.13/
DOI:
Bibkey:
Cite (ACL):
Alexandra Fort and Shobhana Lakshmi Chelliah. 2026. Customizing ASR for Language Documentation and Resource Prioritization. In Proceedings of the Second Workshop on Customizable NLP: Progress and Challenges in Customizing NLP for a Domain, Application, Group, or Individual (CustomNLP4U), pages 149–159, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
Customizing ASR for Language Documentation and Resource Prioritization (Fort & Chelliah, CustomNLP4U 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.customnlp4u-1.13.pdf