2024
pdf
abs
LMEME at SemEval-2024 Task 4: Teacher Student Fusion - Integrating CLIP with LLMs for Enhanced Persuasion Detection
Shiyi Li
|
Yike Wang
|
Liang Yang
|
Shaowu Zhang
|
Hongfei Lin
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
This paper describes our system used in the SemEval-2024 Task 4 Multilingual Detection of Persuasion Techniques in Memes. Our team proposes a detection system that employs a Teacher Student Fusion framework. Initially, a Large Language Model serves as the teacher, engaging in abductive reasoning on multimodal inputs to generate background knowledge on persuasion techniques, assisting in the training of a smaller downstream model. The student model adopts CLIP as an encoder for text and image features, and we incorporate an attention mechanism for modality alignment. Ultimately, our proposed system achieves a Macro-F1 score of 0.8103, ranking 1st out of 20 on the leaderboard of Subtask 2b in English. In Bulgarian, Macedonian and Arabic, our detection capabilities are ranked 1/15, 3/15 and 14/15.
2023
pdf
abs
ZBL2W at SemEval-2023 Task 9: A Multilingual Fine-tuning Model with Data Augmentation for Tweet Intimacy Analysis
Hao Zhang
|
Youlin Wu
|
Junyu Lu
|
Zewen Bai
|
Jiangming Wu
|
Hongfei Lin
|
Shaowu Zhang
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
This paper describes our system used in the SemEval-2023 Task 9 Multilingual Tweet Intimacy Analysis. There are two key challenges in this task: the complexity of multilingual and zero-shot cross-lingual learning, and the difficulty of semantic mining of tweet intimacy. To solve the above problems, our system extracts contextual representations from the pretrained language models, XLM-T, and employs various optimization methods, including adversarial training, data augmentation, ordinal regression loss and special training strategy. Our system ranked 14th out of 54 participating teams on the leaderboard and ranked 10th on predicting languages not in the training data. Our code is available on Github.
2021
pdf
abs
结合标签转移关系的多任务笑点识别方法(Multi-task punchlines recognition method combined with label transfer relationship)
Tongyue Zhang (张童越)
|
Shaowu Zhang (张绍武)
|
Bo Xu (徐博)
|
Liang Yang (杨亮)
|
Hongfei Lin (林鸿飞)
Proceedings of the 20th Chinese National Conference on Computational Linguistics
幽默在人类交流中扮演着重要角色,并大量存在于情景喜剧中。笑点(punchline)是情景喜剧实现幽默效果的形式之一,在情景喜剧笑点识别任务中,每条句子的标签代表该句是否为笑点,但是以往的笑点识别工作通常只通过建模上下文语义关系识别笑点,对标签的利用并不充分。为了充分利用标签序列中的信息,本文提出了一种新的识别方法,即结合条件随机场的单词级-句子级多任务学习模型,该模型在两方面进行了改进,首先将标签序列中相邻两个标签之间的转移关系看作幽默理论中不一致性的一种体现,并使用条件随机场学习这种转移关系,其次由于学习相邻标签之间的转移关系以及上下文语义关系均能够学习到铺垫和笑点之间的不一致性,两者之间存在相关性,为了使模型通过利用这种相关性提高笑点识别的效果,该模型引入了多任务学习方法,使用多任务学习方法同时学习每条句子的句义、组成每条句子的所有字符的词义,单词级别的标签转移关系以及句子级别的标签转移关系。本文在CCL2020“小牛杯”幽默计算—情景喜剧笑点识别评测任务的英文数据集上进行实验,结果表明,本文提出的方法比目前最好的方法提高了3.2%,在情景喜剧幽默笑点识别任务上取得了最好的效果,并通过消融实验证明了上述两方面改进的有效性。
2020
pdf
abs
基于多粒度语义交互理解网络的幽默等级识别(A Multi-Granularity Semantic Interaction Understanding Network for Humor Level Recognition)
Jinhui Zhang (张瑾晖)
|
Shaowu Zhang (张绍武)
|
Xiaochao Fan (樊小超)
|
Liang Yang (杨亮)
|
Hongfei Lin (林鸿飞)
Proceedings of the 19th Chinese National Conference on Computational Linguistics
幽默在人们日常交流中发挥着重要作用。随着人工智能的快速发展,幽默等级识别成为自然语言处理领域的热点研究问题之一。已有的幽默等级识别研究往往将幽默文本看作一个整体,忽视了幽默文本内部的语义关系。本文将幽默等级识别视为自然语言推理任务,将幽默文本划分为“铺垫”和“笑点”两个部分,分别对其语义和语义关系进行建模,提出了一种多粒度语义交互理解网络,从单词和子句两个粒度捕获幽默文本中语义的关联和交互。本文在Reddit公开幽默数据集上进行了实验,相比之前最优结果,模型在语料上的准确率提升了1.3%。实验表明,引入幽默内部的语义关系信息可以提高模型幽默识别的性能,而本文提出的模型也可以很好地建模这种语义关系。
2018
pdf
abs
WECA: A WordNet-Encoded Collocation-Attention Network for Homographic Pun Recognition
Yufeng Diao
|
Hongfei Lin
|
Di Wu
|
Liang Yang
|
Kan Xu
|
Zhihao Yang
|
Jian Wang
|
Shaowu Zhang
|
Bo Xu
|
Dongyu Zhang
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Homographic puns have a long history in human writing, widely used in written and spoken literature, which usually occur in a certain syntactic or stylistic structure. How to recognize homographic puns is an important research. However, homographic pun recognition does not solve very well in existing work. In this work, we first use WordNet to understand and expand word embedding for settling the polysemy of homographic puns, and then propose a WordNet-Encoded Collocation-Attention network model (WECA) which combined with the context weights for recognizing the puns. Our experiments on the SemEval2017 Task7 and Pun of the Day demonstrate that the proposed model is able to distinguish between homographic pun and non-homographic pun texts. We show the effectiveness of the model to present the capability of choosing qualitatively informative words. The results show that our model achieves the state-of-the-art performance on homographic puns recognition.
pdf
abs
Construction of a Chinese Corpus for the Analysis of the Emotionality of Metaphorical Expressions
Dongyu Zhang
|
Hongfei Lin
|
Liang Yang
|
Shaowu Zhang
|
Bo Xu
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Metaphors are frequently used to convey emotions. However, there is little research on the construction of metaphor corpora annotated with emotion for the analysis of emotionality of metaphorical expressions. Furthermore, most studies focus on English, and few in other languages, particularly Sino-Tibetan languages such as Chinese, for emotion analysis from metaphorical texts, although there are likely to be many differences in emotional expressions of metaphorical usages across different languages. We therefore construct a significant new corpus on metaphor, with 5,605 manually annotated sentences in Chinese. We present an annotation scheme that contains annotations of linguistic metaphors, emotional categories (joy, anger, sadness, fear, love, disgust and surprise), and intensity. The annotation agreement analyses for multiple annotators are described. We also use the corpus to explore and analyze the emotionality of metaphors. To the best of our knowledge, this is the first relatively large metaphor corpus with an annotation of emotions in Chinese.