2025
pdf
bib
abs
Teaching Sarcasm: Few-Shot Multimodal Sarcasm Detection via Distillation to a Parameter-Efficient Student
Soumyadeep Jana
|
Ranbir Singh Sanasam
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Multimodal sarcasm detection is challenging, especially in low-resource settings where subtle image-text contradictions are hard to learn due to scarce annotated data, which hinders the model’s performance. Parameter-efficient fine-tuning (PEFT) methods like adapters, LoRA, and prompt tuning reduce overfitting but struggle to reach optimal performance due to limited supervision from few-shot data. We propose PEKD, a unified framework that enhances PEFT methods via distillation from an expert model trained on large-scale sarcasm data, which acts as the teacher. To mitigate unreliable signals from the teacher, we introduce an entropy-aware gating mechanism that dynamically adjusts the distillation strength based on teacher confidence. Experiments on two public datasets demonstrate that our PEKD framework enables PEFT methods to outperform both prior parameter-efficient approaches and large multimodal models, achieving strong results in the few-shot scenario. The framework is modular and adaptable to a wide range of multimodal models and tasks.
2024
pdf
bib
abs
Continuous Attentive Multimodal Prompt Tuning for Few-Shot Multimodal Sarcasm Detection
Soumyadeep Jana
|
Animesh Dey
|
Ranbir Singh Sanasam
Proceedings of the 28th Conference on Computational Natural Language Learning
With the steep rise in multimodal content on social media, multimodal sarcasm detection has gained widespread attention from research communities. Existing studies depend on large-scale data, which is challenging to obtain and expensive to annotate. Thus, investigating this problem in a few-shot scenario is required. Overtly complex multimodal models are prone to overfitting on in-domain data, which hampers their performance on out-of-distribution (OOD) data. To address these issues, we propose Continuous Attentive Multimodal Prompt Tuning model (CAMP), that leverages the prompt tuning paradigm to handle few-shot multimodal sarcasm detection. To overcome the siloed learning process of continuous prompt tokens, we design a novel, continuous multimodal attentive prompt where the continuous tokens intricately engage with both image and text tokens, enabling the assimilation of knowledge from different input modalities. Experimental results indicate that our method outperforms other multimodal baseline methods in the few-shot setting and OOD scenarios.
2023
pdf
bib
abs
Jack-flood at SemEval-2023 Task 5:Hierarchical Encoding and Reciprocal Rank Fusion-Based System for Spoiler Classification and Generation
Sujit Kumar
|
Aditya Sinha
|
Soumyadeep Jana
|
Rahul Mishra
|
Sanasam Ranbir Singh
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
The rise of social media has exponentially witnessed the use of clickbait posts that grab users’ attention. Although work has been done to detect clickbait posts, this is the first task focused on generating appropriate spoilers for these potential clickbaits. This paper presents our approach in this direction. We use different encoding techniques that capture the context of the post text and the target paragraph. We propose hierarchical encoding with count and document length feature-based model for spoiler type classification which uses Recurrence over Pretrained Encoding. We also propose combining multiple ranking with reciprocal rank fusion for passage spoiler retrieval and question-answering approach for phrase spoiler retrieval. For multipart spoiler retrieval, we combine the above two spoiler retrieval methods. Experimental results over the benchmark suggest that our proposed spoiler retrieval methods are able to retrieve spoilers that are semantically very close to the ground truth spoilers.