Abstract
In this paper, we present ongoing work for developing language resources and basic NLP tools for an undocumented variety of Romansh, in the context of a language documentation and language acquisition project. Our tools are meant to improve the speed and reliability of corpus annotations for noisy data involving large amounts of code-switching, occurrences of child-speech and orthographic noise. Being able to increase the efficiency of language resource development for language documentation and acquisition research also constitutes a step towards solving the data sparsity issues with which researchers have been struggling.- Anthology ID:
- W17-2212
- Volume:
- Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
- Month:
- August
- Year:
- 2017
- Address:
- Vancouver, Canada
- Editors:
- Beatrice Alex, Stefania Degaetano-Ortlieb, Anna Feldman, Anna Kazantseva, Nils Reiter, Stan Szpakowicz
- Venue:
- LaTeCH
- SIG:
- SIGHUM
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 89–94
- Language:
- URL:
- https://aclanthology.org/W17-2212
- DOI:
- 10.18653/v1/W17-2212
- Cite (ACL):
- Géraldine Walther and Benoît Sagot. 2017. Speeding up corpus development for linguistic research: language documentation and acquisition in Romansh Tuatschin. In Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pages 89–94, Vancouver, Canada. Association for Computational Linguistics.
- Cite (Informal):
- Speeding up corpus development for linguistic research: language documentation and acquisition in Romansh Tuatschin (Walther & Sagot, LaTeCH 2017)
- PDF:
- https://preview.aclanthology.org/emnlp22-frontmatter/W17-2212.pdf