Abstract
Most low resource language technology development is premised on the need to collect data for training statistical models. When we follow the typical process of recording and transcribing text for small Indigenous languages, we hit up against the so-called “transcription bottleneck.” Therefore it is worth exploring new ways of engaging with speakers which generate data while avoiding the transcription bottleneck. We have deployed a prototype app for speakers to use for confirming system guesses in an approach to transcription based on word spotting. However, in the process of testing the app we encountered many new problems for engagement with speakers. This paper presents a close-up study of the process of deploying data capture technology on the ground in an Australian Aboriginal community. We reflect on our interactions with participants and draw lessons that apply to anyone seeking to develop methods for language data collection in an Indigenous community.- Anthology ID:
- 2022.acl-long.342
- Volume:
- Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- May
- Year:
- 2022
- Address:
- Dublin, Ireland
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 4988–4998
- Language:
- URL:
- https://aclanthology.org/2022.acl-long.342
- DOI:
- 10.18653/v1/2022.acl-long.342
- Cite (ACL):
- Eric Le Ferrand, Steven Bird, and Laurent Besacier. 2022. Learning From Failure: Data Capture in an Australian Aboriginal Community. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4988–4998, Dublin, Ireland. Association for Computational Linguistics.
- Cite (Informal):
- Learning From Failure: Data Capture in an Australian Aboriginal Community (Le Ferrand et al., ACL 2022)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2022.acl-long.342.pdf