Abstract
This paper evaluates lemmatization, POS-tagging, and morphological analysis for four Armenian varieties: Classical Armenian, Modern Eastern Armenian, Modern Western Armenian, and the under-documented Getashen dialect. It compares traditional RNN models, multilingual models like mDeBERTa, and large language models (ChatGPT) using supervised, transfer learning, and zero/few-shot learning approaches. The study finds that RNN models are particularly strong in POS-tagging, while large language models demonstrate high adaptability, especially in handling previously unseen dialect variations. The research highlights the value of cross-variational and in-context learning for enhancing NLP performance in low-resource languages, offering crucial insights into model transferability and supporting the preservation of endangered dialects.- Anthology ID:
- 2024.nlp4dh-1.42
- Volume:
- Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities
- Month:
- November
- Year:
- 2024
- Address:
- Miami, USA
- Editors:
- Mika Hämäläinen, Emily Öhman, So Miyagawa, Khalid Alnajjar, Yuri Bizzoni
- Venue:
- NLP4DH
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 438–449
- Language:
- URL:
- https://aclanthology.org/2024.nlp4dh-1.42
- DOI:
- 10.18653/v1/2024.nlp4dh-1.42
- Cite (ACL):
- Chahan Vidal-Gorène, Nadi Tomeh, and Victoria Khurshudyan. 2024. Cross-Dialectal Transfer and Zero-Shot Learning for Armenian Varieties: A Comparative Analysis of RNNs, Transformers and LLMs. In Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities, pages 438–449, Miami, USA. Association for Computational Linguistics.
- Cite (Informal):
- Cross-Dialectal Transfer and Zero-Shot Learning for Armenian Varieties: A Comparative Analysis of RNNs, Transformers and LLMs (Vidal-Gorène et al., NLP4DH 2024)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2024.nlp4dh-1.42.pdf