Ashmi S N


2026

Stance and target detection in multimodal political memes presents notable challenges in low-resource and highly imbalanced settings.This task is based on the Malayalam dataset from the DravidianLangTech 2026 Shared Task(500 samples with a 95.4:4.6 stance imbalance).The primary challenges stem from linguistic variability and visually complex meme formats,which hinder accurate text extraction and effective multimodal alignment. A lightweight yet high-performing multimodal framework is proposed that integrates bilingual OCR, a Vision Transformer (ViT), and IndicBERT to learn complementary visual and textual representations. A gated fusion mechanism effectivelycombines multimodal features, while asymmetric loss weighting and post-training threshold optimization address extreme class imbalance. The methodology achieves a Weighted F1-score of 0.9535 for stance detection and 0.5283 for target identification, demonstrating strong robustness and generalization under realistic multimodal constraints.