Abstract
Low-resource languages continue to present challenges for current NLP methods, and multilingual NLP is gaining attention in the research community. One of the main issues is the lack of sufficient high-quality annotated data for low-resource languages. In this paper, we show how labeled data for high-resource languages such as English can be used in low-resource NLP. We present two silver datasets for coreference resolution in Ukrainian, adapted from existing English data by manual translation and machine translation in combination with automatic alignment and annotation projection. The code is made publicly available.- Anthology ID:
- 2023.unlp-1.8
- Volume:
- Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP)
- Month:
- May
- Year:
- 2023
- Address:
- Dubrovnik, Croatia
- Editor:
- Mariana Romanyshyn
- Venue:
- UNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 62–72
- Language:
- URL:
- https://aclanthology.org/2023.unlp-1.8
- DOI:
- 10.18653/v1/2023.unlp-1.8
- Cite (ACL):
- Pavlo Kuchmiichuk. 2023. Silver Data for Coreference Resolution in Ukrainian: Translation, Alignment, and Projection. In Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP), pages 62–72, Dubrovnik, Croatia. Association for Computational Linguistics.
- Cite (Informal):
- Silver Data for Coreference Resolution in Ukrainian: Translation, Alignment, and Projection (Kuchmiichuk, UNLP 2023)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/2023.unlp-1.8.pdf