Beyond a Means to an End: A Case Study in Building Phonotactic Corpora for Central Australian Languages

Saliha Muradoglu, James Gray, Jane Helen Simpson, Michael Proctor, Mark Harvey


Abstract
Linguistic datasets are essential across fields: computational linguists use them for NLP development, theoretical linguists for statistical arguments supporting hypotheses about language, and documentary linguists for preserving examples and aiding grammatical descriptions. Transforming raw data (e.g., recordings or dictionaries) into structured forms (e.g., tables) requires non-trivial decisions within processing pipelines.This paper highlights the importance of these processes in understanding linguistic systems. Our contributions include: (1) an interactive dashboard for four central Australian languages with custom filters, and (2) demonstrating how data processing decisions influence measured outcomes.
Anthology ID:
2025.resourceful-1.7
Volume:
Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025)
Month:
March
Year:
2025
Address:
Tallinn, Estonia
Editors:
Špela Arhar Holdt, Nikolai Ilinykh, Barbara Scalvini, Micaella Bruton, Iben Nyholm Debess, Crina Madalina Tudor
Venues:
RESOURCEFUL | WS
SIG:
Publisher:
University of Tartu Library, Estonia
Note:
Pages:
32–37
Language:
URL:
https://preview.aclanthology.org/fix-sig-urls/2025.resourceful-1.7/
DOI:
Bibkey:
Cite (ACL):
Saliha Muradoglu, James Gray, Jane Helen Simpson, Michael Proctor, and Mark Harvey. 2025. Beyond a Means to an End: A Case Study in Building Phonotactic Corpora for Central Australian Languages. In Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025), pages 32–37, Tallinn, Estonia. University of Tartu Library, Estonia.
Cite (Informal):
Beyond a Means to an End: A Case Study in Building Phonotactic Corpora for Central Australian Languages (Muradoglu et al., RESOURCEFUL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-sig-urls/2025.resourceful-1.7.pdf