SU Yao
Also published as: Su Yao
2026
When Helpers Become Hazards: A Benchmark for Analyzing Multimodal LLM-Powered Safety in Daily Life
Xinyue Lou | Xu Jinan | Jingyi Yin | Xiaolong Wang | Zhaolu Kang | Liaoyouwei | Yixuan Wang | Xiangyu Shi | Fengran Mo | SU Yao | Kaiyu Huang
Findings of the Association for Computational Linguistics: ACL 2026
Xinyue Lou | Xu Jinan | Jingyi Yin | Xiaolong Wang | Zhaolu Kang | Liaoyouwei | Yixuan Wang | Xiangyu Shi | Fengran Mo | SU Yao | Kaiyu Huang
Findings of the Association for Computational Linguistics: ACL 2026
As Multimodal Large Language Models (MLLMs) become an indispensable assistant in human life, the unsafe content generated by MLLMs poses a danger to human behavior, perpetually overhanging human society like a sword of Damocles. To investigate and evaluate the safety impact of MLLMs responses on human behavior in daily life, we introduce SaLAD, a multimodal satety benchmark which contains 2,013 real-world image–text samples across 10 common categories, with a balanced design covering both unsafe scenarios and cases of oversensitivity. It emphasizes realistic risk exposure, authentic visual inputs, and fine-grained cross-modal reasoning, ensuring that safety risks cannot be inferred from text alone. We further propose a safety-warning-based evaluation framework that encourages models to provide clear and informative safety warnings, rather than generic refusals. Results on 18 MLLMs demonstrate that the top-performing models achieve a safe response rate of only 57.2% on unsafe queries. Morevoer, even popular safety alignment methods limit effectiveness of the models in our scenario, revealing the vulnerabilities of current MLLMs in identifying dangerous behaviors in daily life. Our dataset is available at https://github.com/xinyuelou/SaLAD.
2024
Walking in Others’ Shoes: How Perspective-Taking Guides Large Language Models in Reducing Toxicity and Bias
Rongwu Xu | Zian Zhou | Tianwei Zhang | Zehan Qi | Su Yao | Ke Xu | Wei Xu | Han Qiu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Rongwu Xu | Zian Zhou | Tianwei Zhang | Zehan Qi | Su Yao | Ke Xu | Wei Xu | Han Qiu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
The common toxicity and societal bias in contents generated by large language models (LLMs) necessitate strategies to reduce harm. Present solutions often demand white-box access to the model or substantial training, which is impractical for cutting-edge commercial LLMs. Moreover, prevailing prompting methods depend on external tool feedback and fail to simultaneously lessen toxicity and bias. Motivated by social psychology principles, we propose a novel strategy named perspective-taking prompting (PeT) that inspires LLMs to integrate diverse human perspectives and self-regulate their responses. This self-correction mechanism can significantly diminish toxicity (up to 89%) and bias (up to 73%) in LLMs’ responses. Rigorous evaluations and ablation studies are conducted on two commercial LLMs (ChatGPT and GLM) and three open-source LLMs, revealing PeT’s superiority in producing less harmful responses, outperforming five strong baselines.