Enhancing Documentation of Hupa with Automatic Speech Recognition

Zoey Liu; Justin Spence; Emily Prud’hommeaux

doi:10.18653/v1/2022.computel-1.23

Enhancing Documentation of Hupa with Automatic Speech Recognition

Zoey Liu, Justin Spence, Emily Prud’hommeaux

Abstract

This study investigates applications of automatic speech recognition (ASR) techniques to Hupa, a critically endangered Native American language from the Dene (Athabaskan) language family. Using around 9h12m of spoken data produced by one elder who is a first-language Hupa speaker, we experimented with different evaluation schemes and training settings. On average a fully connected deep neural network reached a word error rate of 35.26%. Our overall results illustrate the utility of ASR for making Hupa language documentation more accessible and usable. In addition, we found that when training acoustic models, using recordings with transcripts that were not carefully verified did not necessarily have a negative effect on model performance. This shows promise for speech corpora of indigenous languages that commonly include transcriptions produced by second-language speakers or linguists who have advanced knowledge in the language of interest.

Anthology ID:: 2022.computel-1.23
Volume:: Proceedings of the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages
Month:: May
Year:: 2022
Address:: Dublin, Ireland
Editors:: Sarah Moeller, Antonios Anastasopoulos, Antti Arppe, Aditi Chaudhary, Atticus Harrigan, Josh Holden, Jordan Lachler, Alexis Palmer, Shruti Rijhwani, Lane Schwartz
Venue:: ComputEL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 187–192
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2022.computel-1.23/
DOI:: 10.18653/v1/2022.computel-1.23
Bibkey:
Cite (ACL):: Zoey Liu, Justin Spence, and Emily Prud’hommeaux. 2022. Enhancing Documentation of Hupa with Automatic Speech Recognition. In Proceedings of the Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages, pages 187–192, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):: Enhancing Documentation of Hupa with Automatic Speech Recognition (Liu et al., ComputEL 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/2022.computel-1.23.pdf

PDF Cite Search Fix data