Abstract
Lemmatization aims to reduce the sparse data problem by relating the inflected forms of a word to its dictionary form. However, most of the prior work on this topic has focused on high resource languages. In this paper, we evaluate cross-lingual approaches for low resource languages, especially in the context of morphologically rich Indian languages. We test our model on six languages from two different families and develop linguistic insights into each model’s performance.- Anthology ID:
- 2020.coling-main.534
- Volume:
- Proceedings of the 28th International Conference on Computational Linguistics
- Month:
- December
- Year:
- 2020
- Address:
- Barcelona, Spain (Online)
- Venue:
- COLING
- SIG:
- Publisher:
- International Committee on Computational Linguistics
- Note:
- Pages:
- 6070–6076
- Language:
- URL:
- https://aclanthology.org/2020.coling-main.534
- DOI:
- 10.18653/v1/2020.coling-main.534
- Cite (ACL):
- Kumar Saurav, Kumar Saunack, and Pushpak Bhattacharyya. 2020. Analysing cross-lingual transfer in lemmatisation for Indian languages. In Proceedings of the 28th International Conference on Computational Linguistics, pages 6070–6076, Barcelona, Spain (Online). International Committee on Computational Linguistics.
- Cite (Informal):
- Analysing cross-lingual transfer in lemmatisation for Indian languages (Saurav et al., COLING 2020)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2020.coling-main.534.pdf