Huanqian Yan
2026
SafeSteer: A Decoding-level Defense Mechanism for Multimodal Large Language Models
Xinyi Zeng | Xue Yang | Jingyuan Zhang | Huanqian Yan | Xiang Chen | Kaiwen Wei | Hankun Kang | Yu Tian
Findings of the Association for Computational Linguistics: ACL 2026
Xinyi Zeng | Xue Yang | Jingyuan Zhang | Huanqian Yan | Xiang Chen | Kaiwen Wei | Hankun Kang | Yu Tian
Findings of the Association for Computational Linguistics: ACL 2026
Multimodal large language models (MLLMs) are gaining increasing attention. Due to the heterogeneity of their input features, they face significant challenges in terms of jailbreak defenses. Current defense methods rely on costly fine-tuning or inefficient post-hoc interventions, limiting their ability to address novel attacks and involving performance trade-offs. To address the above issues, we explore the endogenous safety capabilities within MLLMs and quantify their intrinsic ability to discern harmfulness at both encoding and decoding stages. We observe that 1) MLLMs can distinguish the harmful and harmless inputs during decoding process, 2) Image-based attacks are more stealthy. Based on these insights, we introduce SafeSteer, a decoding-level defense mechanism for MLLMs. Specifically, it employs a lightweight discriminator, based on the MLLM’s own discriminative ability, to iteratively steer the decoding process toward safety. A safety alignment vector is also integrated to handle complex multimodal threats. Experiments on multiple MLLMs demonstrate that our proposed method can improve safety performance by up to 33.40% without fine-tuning.
Me-Agent: A Personalized Mobile Agent with Two-Level User Habit Learning for Enhanced Interaction
Shuoxin Wang | Chang Liu | Gowen Loo | Lifan Zheng | Kaiwen Wei | Huanqian Yan | Xinyi Zeng | Jingyuan Zhang | Yu Tian
Findings of the Association for Computational Linguistics: ACL 2026
Shuoxin Wang | Chang Liu | Gowen Loo | Lifan Zheng | Kaiwen Wei | Huanqian Yan | Xinyi Zeng | Jingyuan Zhang | Yu Tian
Findings of the Association for Computational Linguistics: ACL 2026
Large Language Model (LLM)-based mobile agents have made significant performance advancements. However, these agents often follow explicit user instructions while overlooking personalized needs, leading to significant limitations for real users, particularly without personalized context: (1) inability to interpret ambiguous instructions, (2) lack of learning from user interaction history, and (3) failure to handle personalized instructions. To alleviate the above challenges, we propose Me-Agent, a learnable and memorable personalized mobile agent. Specifically, Me-Agent incorporates a two-level user habit learning approach. At the prompt level, we design a user preference learning strategy enhanced with a Personal Reward Model to improve personalization performance. At the memory level, we design a Hierarchical Preference Memory, which stores users’ long-term memory and app-specific memory in different level memory. To validate the personalization capabilities of mobile agents, we introduce User FingerTip, a new benchmark featuring numerous ambiguous instructions for daily life. Extensive experiments on User FingerTip and general benchmarks demonstrate that Me-Agent achieves state-of-the-art performance in personalization while maintaining competitive instruction execution performance.