Part-of-Speech Tagging on an Endangered Language: a Parallel Griko-Italian Resource
Antonios Anastasopoulos, Marika Lekakou, Josep Quer, Eleni Zimianiti, Justin DeBenedetto, David Chiang
Abstract
Most work on part-of-speech (POS) tagging is focused on high resource languages, or examines low-resource and active learning settings through simulated studies. We evaluate POS tagging techniques on an actual endangered language, Griko. We present a resource that contains 114 narratives in Griko, along with sentence-level translations in Italian, and provides gold annotations for the test set. Based on a previously collected small corpus, we investigate several traditional methods, as well as methods that take advantage of monolingual data or project cross-lingual POS tags. We show that the combination of a semi-supervised method with cross-lingual transfer is more appropriate for this extremely challenging setting, with the best tagger achieving an accuracy of 72.9%. With an applied active learning scheme, which we use to collect sentence-level annotations over the test set, we achieve improvements of more than 21 percentage points.- Anthology ID:
- C18-1214
- Volume:
- Proceedings of the 27th International Conference on Computational Linguistics
- Month:
- August
- Year:
- 2018
- Address:
- Santa Fe, New Mexico, USA
- Editors:
- Emily M. Bender, Leon Derczynski, Pierre Isabelle
- Venue:
- COLING
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2529–2539
- Language:
- URL:
- https://aclanthology.org/C18-1214
- DOI:
- Cite (ACL):
- Antonios Anastasopoulos, Marika Lekakou, Josep Quer, Eleni Zimianiti, Justin DeBenedetto, and David Chiang. 2018. Part-of-Speech Tagging on an Endangered Language: a Parallel Griko-Italian Resource. In Proceedings of the 27th International Conference on Computational Linguistics, pages 2529–2539, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
- Cite (Informal):
- Part-of-Speech Tagging on an Endangered Language: a Parallel Griko-Italian Resource (Anastasopoulos et al., COLING 2018)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/C18-1214.pdf
- Code
- antonis/grikoresource