Yong Zhuang


2025

pdf bib
HVGuard: Utilizing Multimodal Large Language Models for Hateful Video Detection
Yiheng Jing | Mingming Zhang | Yong Zhuang | Jiacheng Guo | Juan Wang | Xiaoyang Xu | Wenzhe Yi | Keyan Guo | Hongxin Hu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

The rapid growth of video platforms has transformed information dissemination and led to an explosion of multimedia content. However, this widespread reach also introduces risks, as some users exploit these platforms to spread hate speech, which is often concealed through complex rhetoric, making hateful video detection a critical challenge. Existing detection methods rely heavily on unimodal analysis or simple feature fusion, struggling to capture cross-modal interactions and reason through implicit hate in sarcasm and metaphor. To address these limitations, we propose HVGuard, the first reasoning-based hateful video detection framework with multimodal large language models (MLLMs). Our approach integrates Chain-of-Thought (CoT) reasoning to enhance multimodal interaction modeling and implicit hate interpretation. Additionally, we design a Mixture-of-Experts (MoE) network for efficient multimodal fusion and final decision-making. The framework is modular and extensible, allowing flexible integration of different MLLMs and encoders. Experimental results demonstrate that HVGuard outperforms all existing advanced detection tools, achieving an improvement of 6.88% to 13.13% in accuracy and 9.21% to 34.37% in M-F1 on two public datasets covering both English and Chinese.

2012

pdf bib
Non-Linear Models for Confidence Estimation
Yong Zhuang | Guillaume Wisniewski | François Yvon
Proceedings of the Seventh Workshop on Statistical Machine Translation