Morphology-Aware Multi-Granularity Representation Learning for Agglutinative Languages

Zhonghao Zhang, Na Liu, Jiajia Ma, Nier Wu, Guiping Liu


Abstract
Low-resource agglutinative languages, characterized by rich morphological inflection and severe vocabulary sparsity in corpora, have long posed numerous challenges in the field of representation learning. Word-level representations preserve semantic integrity but struggle to handle sparse surface forms, whereas morpheme-level representations, though easier to learn, often lack holistic semantic information. Existing multi-granularity methods are typically modeled at the word and phrase levels, with very limited application to low-resource agglutinative languages. Focusing on the morphemes of agglutinative languages, this paper proposes MAGNet, a morphology-aware gated multi-granularity pre-training framework. At the morpheme granularity, this framework leverages morphological knowledge and integrates morpheme segmentation with morphological tagging to construct fine-grained representations. It further introduces a morphology-aware masked language modeling objective to facilitate the model in learning functional morphological regularities. Meanwhile, at the word granularity, a word-level encoder is employed to capture contextual semantics and maintain its semantic coherence.Finally, a gated fusion mechanism dynamically fuses representations of different granularities according to the context. Experiments conducted on two low-resource agglutinative languages, Mongolian and Turkish, for the tasks of dependency parsing and named entity recognition (NER) demonstrate that our method achieves consistent performance improvements over strong baseline models. Ablation studies further validate the complementary roles of morphological tagging and whole-word modeling in efficient representation learning.
Anthology ID:
2026.acl-srw.92
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Santosh T.Y.S.S., Juan Diego Rodriguez, Ona de Gibert
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1065–1073
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-srw.92/
DOI:
Bibkey:
Cite (ACL):
Zhonghao Zhang, Na Liu, Jiajia Ma, Nier Wu, and Guiping Liu. 2026. Morphology-Aware Multi-Granularity Representation Learning for Agglutinative Languages. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026), pages 1065–1073, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Morphology-Aware Multi-Granularity Representation Learning for Agglutinative Languages (Zhang et al., ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-srw.92.pdf