Speech Technologies with Fieldwork Recordings: the Case of Haitian Creole
William N. Havard, Renauld Govain, Benjamin Lecouteux, Emmanuel Schang
Abstract
We use 40-year-old digitalised tape-recorded fieldwork data in Haitian Creole to train a native self-supervised learning (SSL) model of speech representation (WAV2VEC2). We also use a continued pre-training approach on pre-trained SSL models of two foreign languages the lexifier language – French – and an unrelated language – English. We compare the performances of these three SSL models, and of two other foreign SSL models directly finetuned, on an ASR task, where all five models are fine-tuned on transcribed fieldwork recordings in Haitian Creole. Our results show the best-performing model is the one trained using a continued pre-training approach on the lexifier language, followed by the native model. We conclude that the ‘mobilising the archive’-approach advocated by (Bird, 2020) is a promising way forward to design speech technologies for new languages.- Anthology ID:
- 2025.computel-main.5
- Volume:
- Proceedings of the Eight Workshop on the Use of Computational Methods in the Study of Endangered Languages
- Month:
- March
- Year:
- 2025
- Address:
- Honolulu, Hawaii, USA
- Editors:
- Jordan Lachler, Godfred Agyapong, Antti Arppe, Sarah Moeller, Aditi Chaudhary, Shruti Rijhwani, Daisy Rosenblum
- Venues:
- ComputEL | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 40–46
- Language:
- URL:
- https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.computel-main.5/
- DOI:
- Cite (ACL):
- William N. Havard, Renauld Govain, Benjamin Lecouteux, and Emmanuel Schang. 2025. Speech Technologies with Fieldwork Recordings: the Case of Haitian Creole. In Proceedings of the Eight Workshop on the Use of Computational Methods in the Study of Endangered Languages, pages 40–46, Honolulu, Hawaii, USA. Association for Computational Linguistics.
- Cite (Informal):
- Speech Technologies with Fieldwork Recordings: the Case of Haitian Creole (Havard et al., ComputEL 2025)
- PDF:
- https://preview.aclanthology.org/Ingest-2025-COMPUTEL/2025.computel-main.5.pdf