DR-HM: Distill-then-Reinforce Training with Cognition-Aware Data Synthesis for Harmful Meme Detection

Zihan Cheng; Jianxiang Ma; Xiaocui Yang; Peidong Wang; Wen Zhang; Shi Feng; Daling Wang; Yifei Zhang; Mingfu Zhang

DR-HM: Distill-then-Reinforce Training with Cognition-Aware Data Synthesis for Harmful Meme Detection

Zihan Cheng, Jianxiang Ma, Xiaocui Yang, Peidong Wang, Wen Zhang, Shi Feng, Daling Wang, Yifei Zhang, Mingfu Zhang

Abstract

Harmful memes convey offensive intent through implicit associations between visual symbols and text, requiring a broad understanding of cultural stereotypes and visual metaphors. Small-scale Multimodal Large Language Models (MLLMs) often lack the knowledge required to identify such implicit hate, whereas Large-scale MLLMs, despite their broader knowledge, exhibit systematic labeling bias. To address these challenges, we propose DR-HM, a Distill-then-Reinforce training framework with cognition-aware data synthesis for harmful meme detection, which aims to transfer knowledge from closed-source models while mitigating their biases. DR-HM introduces a six-step structured data synthesis scheme with self-refinement that decomposes meme analysis into a progressive, human-inspired reasoning process from entity recognition to harmfulness judgment. Based on the synthesized reasoning data, we further adopt a Distill-then-Reinforce training strategy. This approach combines a two-stage Supervised Fine-Tuning (SFT) with an Adaptive Group Relative Policy Optimization (A-GRPO) algorithm, which incorporates class-ratio-aware reward weighting and dynamic KL coefficients. Experiments on three benchmark datasets show that the proposed approach consistently outperforms existing methods and achieves an accuracy of 84.7% on the FHM dataset, approaching the reported performance of human annotators.

Anthology ID:: 2026.findings-acl.2130
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 42975–42993
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.2130/
DOI:
Bibkey:
Cite (ACL):: Zihan Cheng, Jianxiang Ma, Xiaocui Yang, Peidong Wang, Wen Zhang, Shi Feng, Daling Wang, Yifei Zhang, and Mingfu Zhang. 2026. DR-HM: Distill-then-Reinforce Training with Cognition-Aware Data Synthesis for Harmful Meme Detection. In Findings of the Association for Computational Linguistics: ACL 2026, pages 42975–42993, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: DR-HM: Distill-then-Reinforce Training with Cognition-Aware Data Synthesis for Harmful Meme Detection (Cheng et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.2130.pdf
Checklist:: 2026.findings-acl.2130.checklist.pdf

PDF Cite Search Checklist Fix data