Yang Liu

Other people with similar names: Yang Liu , Yang Liu (Wilfrid Laurier University), Yang Liu (刘扬) (刘扬; Ph.D Purdue; ICSI, Dallas, Facebook, Liulishuo, Amazon), Yang Liu , Yang Liu (刘洋) (刘洋; ICT, Tsinghua, Beijing Academy of Artificial Intelligence), Yang Liu (Edinburgh Ph.D., Microsoft), Yang Liu (University of Helsinki), Yang Liu (Samsung Research Center Beijing), Yang Liu (Tianjin University, China), Yang Liu , Yang Liu (Microsoft Cognitive Services Research), Yang Liu (Univ. of Michigan, UC Santa Cruz), Yang Liu , Yang Liu (National University of Defense Technology), Yang Liu , Yang Liu , Yang Janet Liu (Georgetown University; 刘洋), Yang Liu (刘扬) (Peking University), Yang Liu (The Chinese University of Hong Kong (Shenzhen)), Yang Liu , Yang Liu , Yang Liu (3M Health Information Systems), Yang Liu (Beijing Language and Culture University)


2025

pdf bib
Knowledge Image Matters: Improving Knowledge-Based Visual Reasoning with Multi-Image Large Language Models
Guanghui Ye | Huan Zhao | Zhixue Zhao | Xupeng Zha | Yang Liu | Zhihua Jiang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We revisit knowledge-based visual reasoning (KB-VR) in light of modern advances in multimodal large language models (MLLMs), and make the following contributions: (i) We propose Visual Knowledge Card (VKC) – a novel image that incorporates not only internal visual knowledge (e.g., scene-aware information) detected from the raw image, but also external world knowledge (e.g., attribute or object knowledge) produced by a knowledge generator; (ii) We present VKC-based Multi-Image Reasoning (VKC-MIR) – a four-stage pipeline which harnesses a state-of-the-art scene perception engine to construct an initial VKC (Stage-1), a powerful LLM to generate relevant domain knowledge (Stage-2), an excellent image editing toolkit to introduce generated knowledge into the updated VKC (Stage-3), and finally, an emerging multi-image MLLM to solve the VKC-enhanced task (Stage-4). By performing experiments on three popular KB-VR benchmarks, our approach achieves new state-of-the-art results compared to previous top-performing models.