Yayue Deng

Also published as: 雅月


2025

pdf bib
MemeArena: Automating Context-Aware Unbiased Evaluation of Harmfulness Understanding for Multimodal Large Language Models
Zixin Chen | Hongzhan Lin | Kaixin Li | Ziyang Luo | Yayue Deng | Jing Ma
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

The proliferation of memes on social media necessitates the capabilities of multimodal Large Language Models (mLLMs) to effectively understand multimodal harmfulness. Existing evaluation approaches predominantly focus on mLLMs’ detection accuracy for binary classification tasks, which often fail to reflect the in-depth interpretive nuance of harmfulness across diverse contexts. In this paper, we propose MemeArena, an agent-based arena-style evaluation framework that provides a context-aware and unbiased assessment for mLLMs’ understanding of multimodal harmfulness. Specifically, MemeArena simulates diverse interpretive contexts to formulate evaluation tasks that elicit perspective-specific analyses from mLLMs. By integrating varied viewpoints and reaching consensus among evaluators, it enables fair and unbiased comparisons of mLLMs’ abilities to interpret multimodal harmfulness. Extensive experiments demonstrate that our framework effectively reduces the evaluation biases of judge agents, with judgment results closely aligning with human preferences, offering valuable insights into reliable and comprehensive mLLM evaluations in multimodal harmfulness understanding. Our code and data are publicly available at https://github.com/Lbotirx/MemeArena.

2021

pdf bib
字里行间的道德:中文文本道德句识别研究(Morality Between the Lines: Research on Identification of Chinese Moral Sentence)
Shiya Peng (彭诗雅) | Chang Liu (刘畅) | Yayue Deng (邓雅月) | Dong Yu (于东)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

随着人工智能的发展,越来越多的研究开始关注人工智能伦理。在NLP领域,道德自动识别作为研究分析文本中的道德的一项重要任务,近年来开始受到研究者的关注。该任务旨在识别文本中的道德片段,其对自然语言处理的道德相关的下游任务如偏见识别消除、判定模型隐形歧视等具有重要意义。与英文相比,目前面向中文的道德识别研究开展缓慢,其主要原因是至今还没有较大型的道德中文数据集为研究提供数据。为解决上述问题,本文在中文语料上进行了中文道德句的标注工作,并初步对识别中文文本道德句进行探索。我们首先构建了国内首个10万级别的中文道德句数据集,然后本文提出了利用流行的几种机器学习方法探究识别中文道德句任务的效果。此外,我们还探索了利用额外知识辅助的方法,对中文道德句的识别任务进行了进一步的探究。