Yuyuan Li
2026
“I See What You Did There”: Can Large Vision-Language Models Understand Multimodal Puns?
Naen Xu | Jiayi Sheng | Changjiang Li | Chunyi Zhou | Yuyuan Li | Tianyu Du | Jun Wang | Zhihui Fu | Jinbao Li | Shouling Ji
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Naen Xu | Jiayi Sheng | Changjiang Li | Chunyi Zhou | Yuyuan Li | Tianyu Du | Jun Wang | Zhihui Fu | Jinbao Li | Shouling Ji
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Puns are a common form of rhetorical wordplay that exploits polysemy and phonetic similarity to create humor. In multimodal puns, visual and textual elements synergize to ground the literal sense and evoke the figurative meaning simultaneously. Although Vision-Language Models (VLMs) are widely used in multimodal understanding and generation, their ability to understand puns has not been systematically studied due to a scarcity of rigorous benchmarks. To address this, we first propose a multimodal pun generation pipeline. We then introduce MultiPun, a dataset comprising diverse types of puns alongside adversarial non-pun distractors. Our evaluation reveals that most models struggle to distinguish genuine puns from these distractors. Moreover, we propose both prompt-level and model-level strategies to enhance pun comprehension, with an average improvement of 16.5% in F1 scores. Our findings provide valuable insights for developing future VLMs that master the subtleties of human-like humor via cross-modal reasoning.
2024
Fine-grained Pluggable Gradient Ascent for Knowledge Unlearning in Language Models
XiaoHua Feng | Chaochao Chen | Yuyuan Li | Zibin Lin
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
XiaoHua Feng | Chaochao Chen | Yuyuan Li | Zibin Lin
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Pre-trained language models acquire knowledge from vast amounts of text data, which can inadvertently contain sensitive information. To mitigate the presence of undesirable knowledge, the task of knowledge unlearning becomes crucial for language models. Previous research relies on gradient ascent methods to achieve knowledge unlearning, which is simple and effective. However, this approach calculates all the gradients of tokens in the sequence, potentially compromising the general ability of language models. To overcome this limitation, we propose an adaptive objective that calculates gradients with fine-grained control specifically targeting sensitive tokens. Our adaptive objective is pluggable, ensuring simplicity and enabling extension to the regularization-based framework that utilizes non-target data or other models to preserve general ability. Through extensive experiments targeting the removal of typical sensitive data, we demonstrate that our proposed method enhances the general ability of language models while achieving knowledge unlearning. Additionally, it demonstrates the capability to adapt to behavior alignment, eliminating all the undesirable knowledge within a specific domain.