Learning From Failure: Data Capture in an Australian Aboriginal Community

Eric Le Ferrand, Steven Bird, Laurent Besacier


Abstract
Most low resource language technology development is premised on the need to collect data for training statistical models. When we follow the typical process of recording and transcribing text for small Indigenous languages, we hit up against the so-called “transcription bottleneck.” Therefore it is worth exploring new ways of engaging with speakers which generate data while avoiding the transcription bottleneck. We have deployed a prototype app for speakers to use for confirming system guesses in an approach to transcription based on word spotting. However, in the process of testing the app we encountered many new problems for engagement with speakers. This paper presents a close-up study of the process of deploying data capture technology on the ground in an Australian Aboriginal community. We reflect on our interactions with participants and draw lessons that apply to anyone seeking to develop methods for language data collection in an Indigenous community.
Anthology ID:
2022.acl-long.342
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
4988–4998
Language:
URL:
https://aclanthology.org/2022.acl-long.342
DOI:
10.18653/v1/2022.acl-long.342
Bibkey:
Cite (ACL):
Eric Le Ferrand, Steven Bird, and Laurent Besacier. 2022. Learning From Failure: Data Capture in an Australian Aboriginal Community. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4988–4998, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Learning From Failure: Data Capture in an Australian Aboriginal Community (Le Ferrand et al., ACL 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.acl-long.342.pdf
Video:
 https://preview.aclanthology.org/ingestion-script-update/2022.acl-long.342.mp4