Sakshi Singh


2026

Multilingual Natural Language Understanding (NLU) systems often struggle to adapt when new languages or new semantic labels are introduced with only a few annotated examples. This challenge is particularly pronounced for low-resource languages, where limited supervision and evolving label spaces make conventional joint-label classification approaches unstable. Most existing multilingual NLU models treat each language-semantic pair as an independent class, entangling linguistic and semantic representations and hindering few-shot adaptation. We propose Dual-Axis Compositional Few-Shot Learning, a framework that explicitly factorizes the representation space into linguistic and semantic embedding axes, enabling independent modeling of language variation and domain-intent semantics. Joint representations are constructed compositionally through multiplicative interaction of axis-specific embeddings, allowing controlled adaptation when either the language set or the semantic label space evolves. The framework integrates factorized prototype learning, axis-structured contrastive alignment, and disentanglement regularization using HSIC-based statistical independence and Jacobian-based cross-axis decorrelation. Experiments on six low-resource Indic languages spanning Indo-Aryan and Dravidian families (Hindi, Bengali, Sanskrit, Assamese, Tamil, and Telugu) demonstrate strong performance under two structured generalization regimes. The model achieves 81.12% accuracy when adapting to few-shot languages with known semantics and 63.5% accuracy when learning new semantic classes from few-shot examples, along with an accuracy of 89.56% on known language and seen semantics. These results show that axis-factorized representations enable stable compositional generalization, offering a promising direction for scalable multilingual NLU in linguistically diverse low-resource settings.

2020

Social media is abundant in visual and textual information presented together or in isolation. Memes are the most popular form, belonging to the former class. In this paper, we present our approaches for the Memotion Analysis problem as posed in SemEval-2020 Task 8. The goal of this task is to classify memes based on their emotional content and sentiment. We leverage techniques from Natural Language Processing (NLP) and Computer Vision (CV) towards the sentiment classification of internet memes (Subtask A). We consider Bimodal (text and image) as well as Unimodal (text-only) techniques in our study ranging from the Na ̈ıve Bayes classifier to Transformer-based approaches. Our results show that a text-only approach, a simple Feed Forward Neural Network (FFNN) with Word2vec embeddings as input, performs superior to all the others. We stand first in the Sentiment analysis task with a relative improvement of 63% over the baseline macro-F1 score. Our work is relevant to any task concerned with the combination of different modalities.