Hanieh Naderi
2026
YNWAAZ at SemEval-2026 Task 1: Bridging the Semantic-Visual Gap: Multimodal Humor Generation
Mohammad Erfan Zare | Tahere Abbasi | Hadi Veisi | Sayin Ala | Hanieh Naderi
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Mohammad Erfan Zare | Tahere Abbasi | Hadi Veisi | Sayin Ala | Hanieh Naderi
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Developing Computational Humor systems at a multilingual and multimodal scale requires transcending simple text generation paradigms to focus on intent and context understanding. In this study, we address two key limitations in Foundation Models:Association Failure in textual tasks, which prevents the formation of coherent semantic links between incongruous concepts, and Temporal Blindness in video processing, which disrupts narrative comprehension. To tackle these challenges, we propose a unified architecture comprising an Intent-Aware RAG system for mitigating linguistic gaps across English, Spanish, and Chinese, and a Cascaded Visual Perception pipeline for modeling the narrative structure of video content. A key innovation of this work is the utilization of small language models (TinyLlama) as a SemanticDenoise Filter, converting noisy visual signals into structured, coherent textual representations. Experimental results demonstrate that this modular architecture reduces cultural-semantic gaps in certain languages and produces outputs that generally align better with human humor preferences, though highly nuanced languages still present a challenge.