Abstract
We describe a fully unsupervised cross-lingual transfer approach for part-of-speech (POS) tagging under a truly low resource scenario. We assume access to parallel translations between the target language and one or more source languages for which POS taggers are available. We use the Bible as parallel data in our experiments: small size, out-of-domain and covering many diverse languages. Our approach innovates in three ways: 1) a robust approach of selecting training instances via cross-lingual annotation projection that exploits best practices of unsupervised type and token constraints, word-alignment confidence and density of projected POS, 2) a Bi-LSTM architecture that uses contextualized word embeddings, affix embeddings and hierarchical Brown clusters, and 3) an evaluation on 12 diverse languages in terms of language family and morphological typology. In spite of the use of limited and out-of-domain parallel data, our experiments demonstrate significant improvements in accuracy over previous work. In addition, we show that using multi-source information, either via projection or output combination, improves the performance for most target languages.- Anthology ID:
- 2020.emnlp-main.391
- Volume:
- Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 4820–4831
- Language:
- URL:
- https://aclanthology.org/2020.emnlp-main.391
- DOI:
- 10.18653/v1/2020.emnlp-main.391
- Cite (ACL):
- Ramy Eskander, Smaranda Muresan, and Michael Collins. 2020. Unsupervised Cross-Lingual Part-of-Speech Tagging for Truly Low-Resource Scenarios. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4820–4831, Online. Association for Computational Linguistics.
- Cite (Informal):
- Unsupervised Cross-Lingual Part-of-Speech Tagging for Truly Low-Resource Scenarios (Eskander et al., EMNLP 2020)
- PDF:
- https://preview.aclanthology.org/nodalida-main-page/2020.emnlp-main.391.pdf