Abstract
We propose a novel transcription workflow which combines spoken term detection and human-in-the-loop, together with a pilot experiment. This work is grounded in an almost zero-resource scenario where only a few terms have so far been identified, involving two endangered languages. We show that in the early stages of transcription, when the available data is insufficient to train a robust ASR system, it is possible to take advantage of the transcription of a small number of isolated words in order to bootstrap the transcription of a speech collection.- Anthology ID:
- 2020.coling-main.303
- Volume:
- Proceedings of the 28th International Conference on Computational Linguistics
- Month:
- December
- Year:
- 2020
- Address:
- Barcelona, Spain (Online)
- Venue:
- COLING
- SIG:
- Publisher:
- International Committee on Computational Linguistics
- Note:
- Pages:
- 3422–3428
- Language:
- URL:
- https://aclanthology.org/2020.coling-main.303
- DOI:
- 10.18653/v1/2020.coling-main.303
- Cite (ACL):
- Eric Le Ferrand, Steven Bird, and Laurent Besacier. 2020. Enabling Interactive Transcription in an Indigenous Community. In Proceedings of the 28th International Conference on Computational Linguistics, pages 3422–3428, Barcelona, Spain (Online). International Committee on Computational Linguistics.
- Cite (Informal):
- Enabling Interactive Transcription in an Indigenous Community (Le Ferrand et al., COLING 2020)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2020.coling-main.303.pdf