Initial Experiments In Cross-Lingual Morphological Analysis Using Morpheme Segmentation
Vladislav Mikhailov, Lorenzo Tosi, Anastasia Khorosheva, Oleg Serikov
Abstract
The paper describes initial experiments in data-driven cross-lingual morphological analysis of open-category words using a combination of unsupervised morpheme segmentation, annotation projection and an LSTM encoder-decoder model with attention. Our algorithm provides lemmatisation and morphological analysis generation for previously unseen low-resource language surface forms with only annotated data on the related languages given. Despite the inherently lossy annotation projection, we achieved the best lemmatisation F1-score in the VarDial 2019 Shared Task on Cross-Lingual Morphological Analysis for both Karachay-Balkar (Turkic languages, agglutinative morphology) and Sardinian (Romance languages, fusional morphology).- Anthology ID:
- W19-1415
- Volume:
- Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects
- Month:
- June
- Year:
- 2019
- Address:
- Ann Arbor, Michigan
- Editors:
- Marcos Zampieri, Preslav Nakov, Shervin Malmasi, Nikola Ljubešić, Jörg Tiedemann, Ahmed Ali
- Venue:
- VarDial
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 144–152
- Language:
- URL:
- https://aclanthology.org/W19-1415
- DOI:
- 10.18653/v1/W19-1415
- Cite (ACL):
- Vladislav Mikhailov, Lorenzo Tosi, Anastasia Khorosheva, and Oleg Serikov. 2019. Initial Experiments In Cross-Lingual Morphological Analysis Using Morpheme Segmentation. In Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects, pages 144–152, Ann Arbor, Michigan. Association for Computational Linguistics.
- Cite (Informal):
- Initial Experiments In Cross-Lingual Morphological Analysis Using Morpheme Segmentation (Mikhailov et al., VarDial 2019)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-5/W19-1415.pdf