Xinhao Li

Also published as: 鑫豪


2025

Object hallucinations in Large Vision-Language Models (LVLMs) significantly impede their real-world applicability. As the primary component for accurately interpreting visual information, the choice of visual encoder is pivotal. We hypothesize that the diverse training paradigms employed by different visual encoders instill them with distinct inductive biases, which leads to their diverse hallucination performances. Existing benchmarks typically focus on coarse-grained hallucination detection and fail to capture the diverse hallucinations elaborated in our hypothesis. To systematically analyze these effects, we introduce VHBench-10, a comprehensive benchmark for evaluating LVLMs across ten fine-grained hallucination categories. Our evaluations confirm encoders exhibit unique hallucination characteristics. Building on these insights and the suboptimality of simple feature fusion, we propose VisionWeaver, a novel Context-Aware Routing Network. It employs global visual features to generate routing signals, dynamically aggregating visual features from multiple specialized experts. Comprehensive experiments confirm the effectiveness of VisionWeaver in significantly reducing hallucinations and improving overall model performance. Our code and benchmark are available at https://github.com/whwangovo/VisionWeaver.

2023

“青藏地区多元的民族构成以及悠久的历史沉淀孕育出丰富且独特的青藏文化,使得这片雪域圣地焉然成为了“高原文化宝库”。然而受闭塞的交通条件和较滞后的经济水平的限制,青藏地区文旅资源的保护与弘扬工作始终处于滞后状态。本文以数字人文为导向,在提示学习框架下采用联合学习的方式对文本中实体与关系的抽取,实现低资源条件下的知识抽取,形成一套文旅知识图谱构建范式,并以全国重点文物保护单位‘塔尔寺’为代表,完整的介绍了塔尔寺知识图谱从本体设计、原始数据获取、知识抽取到可视化展示的详细流程。最终,本文所构建的塔尔寺知识图谱共包含4705个节点及17386条关系。本文的工作弥补了人文领域青藏文化的结构化数据不足的问题,同时为青藏文旅在数字人文领域的研究提供参考。”