Sree Nithish Reddy Gunapati


2026

Multilingual Natural Language Understanding (NLU) systems often struggle to adapt when new languages or new semantic labels are introduced with only a few annotated examples. This challenge is particularly pronounced for low-resource languages, where limited supervision and evolving label spaces make conventional joint-label classification approaches unstable. Most existing multilingual NLU models treat each language-semantic pair as an independent class, entangling linguistic and semantic representations and hindering few-shot adaptation. We propose Dual-Axis Compositional Few-Shot Learning, a framework that explicitly factorizes the representation space into linguistic and semantic embedding axes, enabling independent modeling of language variation and domain-intent semantics. Joint representations are constructed compositionally through multiplicative interaction of axis-specific embeddings, allowing controlled adaptation when either the language set or the semantic label space evolves. The framework integrates factorized prototype learning, axis-structured contrastive alignment, and disentanglement regularization using HSIC-based statistical independence and Jacobian-based cross-axis decorrelation. Experiments on six low-resource Indic languages spanning Indo-Aryan and Dravidian families (Hindi, Bengali, Sanskrit, Assamese, Tamil, and Telugu) demonstrate strong performance under two structured generalization regimes. The model achieves 81.12% accuracy when adapting to few-shot languages with known semantics and 63.5% accuracy when learning new semantic classes from few-shot examples, along with an accuracy of 89.56% on known language and seen semantics. These results show that axis-factorized representations enable stable compositional generalization, offering a promising direction for scalable multilingual NLU in linguistically diverse low-resource settings.