Team_One@DravidianLangTech 2026: A Gated Multimodal Architecture for Multi-Level Stance and Target Detection in Malayalam Political Memes
Nimisha M Iyer, Ashmi S N, Balasubramanian Palani, Jobin Jose, Siranjeevi Rajamanickam
Abstract
Stance and target detection in multimodal political memes presents notable challenges in low-resource and highly imbalanced settings.This task is based on the Malayalam dataset from the DravidianLangTech 2026 Shared Task(500 samples with a 95.4:4.6 stance imbalance).The primary challenges stem from linguistic variability and visually complex meme formats,which hinder accurate text extraction and effective multimodal alignment. A lightweight yet high-performing multimodal framework is proposed that integrates bilingual OCR, a Vision Transformer (ViT), and IndicBERT to learn complementary visual and textual representations. A gated fusion mechanism effectivelycombines multimodal features, while asymmetric loss weighting and post-training threshold optimization address extreme class imbalance. The methodology achieves a Weighted F1-score of 0.9535 for stance detection and 0.5283 for target identification, demonstrating strong robustness and generalization under realistic multimodal constraints.- Anthology ID:
- 2026.dravidianlangtech-1.65
- Volume:
- Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
- Month:
- July
- Year:
- 2026
- Address:
- Underline (Virtual)
- Editors:
- Bharathi Raja Chakravarthi, Ruba Priyadharshini, Anand Kumar Madasamy, Sajeetha Thavareesan, Saranya Rajiakodi, Subalalitha Navaneethakrishnan, Dhivya Chinnappa, Balasubramanian Palani, Malliga Subramanian, Kogilavani Shanmugavadivel, Ratnavel Rajalakshmi
- Venues:
- DravidianLangTech | WS
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 409–413
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.dravidianlangtech-1.65/
- DOI:
- Cite (ACL):
- Nimisha M Iyer, Ashmi S N, Balasubramanian Palani, Jobin Jose, and Siranjeevi Rajamanickam. 2026. Team_One@DravidianLangTech 2026: A Gated Multimodal Architecture for Multi-Level Stance and Target Detection in Malayalam Political Memes. In Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages, pages 409–413, Underline (Virtual). Association for Computational Linguistics.
- Cite (Informal):
- Team_One@DravidianLangTech 2026: A Gated Multimodal Architecture for Multi-Level Stance and Target Detection in Malayalam Political Memes (Iyer et al., DravidianLangTech 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl-workshops/2026.dravidianlangtech-1.65.pdf