MāOri Loanwords: A Corpus of New Zealand English Tweets

David Trye, Andreea Calude, Felipe Bravo-Marquez, Te Taka Keegan


Abstract
Māori loanwords are widely used in New Zealand English for various social functions by New Zealanders within and outside of the Māori community. Motivated by the lack of linguistic resources for studying how Māori loanwords are used in social media, we present a new corpus of New Zealand English tweets. We collected tweets containing selected Māori words that are likely to be known by New Zealanders who do not speak Māori. Since over 30% of these words turned out to be irrelevant, we manually annotated a sample of our tweets into relevant and irrelevant categories. This data was used to train machine learning models to automatically filter out irrelevant tweets.
Anthology ID:
P19-2018
Volume:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
Month:
July
Year:
2019
Address:
Florence, Italy
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
136–142
Language:
URL:
https://aclanthology.org/P19-2018
DOI:
10.18653/v1/P19-2018
Bibkey:
Cite (ACL):
David Trye, Andreea Calude, Felipe Bravo-Marquez, and Te Taka Keegan. 2019. MāOri Loanwords: A Corpus of New Zealand English Tweets. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pages 136–142, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
MāOri Loanwords: A Corpus of New Zealand English Tweets (Trye et al., ACL 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/P19-2018.pdf