Jiaqi Wang
Other people with similar names: Jiaqi Wang, Jiaqi Wang
Unverified author pages with similar names: Jiaqi Wang
2026
VideoPro: Adaptive Program Reasoning for Long Video Understanding
Chenglin Li | Feng Han | Yikun Wang | Ruilin Li | Shuai Dong | Haowen Hou | Haitao Li | Qianglong Chen | Feng Tao | Jingqi Tong | Yin Zhang | Jiaqi Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Chenglin Li | Feng Han | Yikun Wang | Ruilin Li | Shuai Dong | Haowen Hou | Haitao Li | Qianglong Chen | Feng Tao | Jingqi Tong | Yin Zhang | Jiaqi Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Understanding long videos remains challenging due to the sparsity of visual evidence relevant to a given query. Prior work has explored program-based visual grounding, typically relying on executable programs generated by auxiliary large language models. However, when scaling to long videos, existing approaches face several critical limitations: (1) frame-centric vision modules are often insufficient for long video processing; (2) naively applying program-based reasoning to all queries incurs considerable computational overhead; and (3) errors arising from low-confidence predictions and imperfect program execution are difficult to recover from. To address these challenges, we propose VideoPro, a unified framework that enables VideoLLMs to adaptively reason over long videos and refine their predictions through executable programs. VideoPro first performs adaptive reasoning, dynamically determining whether a query can be resolved directly by the native VideoLLM or requires explicit multi-step program reasoning. For complex queries, the model decomposes the task into executable programs that invoke specialized vision modules for precise temporal and semantic grounding. To further improve robustness, VideoPro incorporates a self-refinement mechanism that leverages execution feedback and confidence signals to correct erroneous executions and refine low-confidence reasoning programs. By tightly integrating adaptive reasoning with self-refinement, VideoPro consistently outperforms prior methods across multiple long-video understanding benchmarks, yielding an average 6.7% improvement for Qwen3-VL-8B.
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
Junbo Niu | Zheng Liu | Zhuangcheng Gu | Bin Wang | Linke Ouyang | Zhiyuan Zhao | Tao Chu | Tianyao He | Fan Wu | Qintong Zhang | Zhenjiang Jin | Guang Liang | Rui Zhang | Wenzheng Zhang | Yuan Qu | Zhifei Ren | Yuefeng Sun | Zirui Tang | Boyu Niu | Yuanhong Zheng | Dongsheng Ma | Ziyang Miao | Hejun Dong | Siyi Qian | Junyuan Zhang | Fangdong Wang | Jingzhou Chen | Xiaomeng Zhao | Liqun Wei | Wei Li | Shasha Wang | RuiLiang Xu | Yuanyuan Cao | Lu Chen | Qianqian Wu | Huaiyu Gu | Lindong Lu | Dechen Lin | Shenguanlin | Xuanhe Zhou | Linfeng Zhang | Yuhang Zang | Xiaoyi Dong | Jiaqi Wang | Bo Zhang | Lei Bai | Pei Chu | Weijia Li | Jiang Wu | Lijun Wu | Zhenxiang Li | Guangyu Wang | Zhongying Tu | Chao Xu | Kai Chen | Bowen Zhou | Dahua Lin | Wentao Zhang | Conghui He
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Junbo Niu | Zheng Liu | Zhuangcheng Gu | Bin Wang | Linke Ouyang | Zhiyuan Zhao | Tao Chu | Tianyao He | Fan Wu | Qintong Zhang | Zhenjiang Jin | Guang Liang | Rui Zhang | Wenzheng Zhang | Yuan Qu | Zhifei Ren | Yuefeng Sun | Zirui Tang | Boyu Niu | Yuanhong Zheng | Dongsheng Ma | Ziyang Miao | Hejun Dong | Siyi Qian | Junyuan Zhang | Fangdong Wang | Jingzhou Chen | Xiaomeng Zhao | Liqun Wei | Wei Li | Shasha Wang | RuiLiang Xu | Yuanyuan Cao | Lu Chen | Qianqian Wu | Huaiyu Gu | Lindong Lu | Dechen Lin | Shenguanlin | Xuanhe Zhou | Linfeng Zhang | Yuhang Zang | Xiaoyi Dong | Jiaqi Wang | Bo Zhang | Lei Bai | Pei Chu | Weijia Li | Jiang Wu | Lijun Wu | Zhenxiang Li | Guangyu Wang | Zhongying Tu | Chao Xu | Kai Chen | Bowen Zhou | Dahua Lin | Wentao Zhang | Conghui He
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
We introduce MinerU2.5, a 1.2B-parameter document parsing vision-language model that achieves state-of-the-art recognition accuracy while maintaining exceptional computational efficiency. Our approach employs a coarse-to-fine, two-stage parsing strategy that decouples global layout analysis from local content recognition. In the first stage, the model performs efficient layout analysis on downsampled images to identify structural elements, circumventing the computational overhead of processing high-resolution inputs. In the second stage, guided by the global layout, it performs targeted content recognition on native-resolution crops extracted from the original image, preserving fine-grained details in dense text, complex formulas, and tables. To support this strategy, we developed a comprehensive data engine that generates diverse, large-scale training corpora for both pretraining and fine-tuning. Ultimately, MinerU2.5 demonstrates strong document parsing ability, achieving state-of-the-art performance on multiple benchmarks, surpassing both general-purpose and domain-specific models across various recognition tasks, while maintaining significantly lower computational overhead.
Search
Fix author
Co-authors
- Lei Bai 1
- Yuanyuan Cao 1
- Jingzhou Chen 1
- Kai Chen 1
- Lu Chen 1
- Qianglong Chen 1
- Pei Chu 1
- Tao Chu 1
- Hejun Dong 1
- Shuai Dong 1
- Xiaoyi Dong 1
- Huaiyu Gu 1
- Zhuangcheng Gu 1
- Feng Han 1
- Conghui He 1
- Tianyao He 1
- Haowen Hou 1
- Zhenjiang Jin 1
- Chenglin Li 1
- Haitao Li 1
- Ruilin Li 1
- Wei Li 1
- Weijia Li 1
- Zhenxiang Li 1
- Guang Liang 1
- Dahua Lin 1
- Dechen Lin 1
- Zheng Liu 1
- Lindong Lu 1
- Dongsheng Ma 1
- Ziyang Miao 1
- Boyu Niu 1
- Junbo Niu 1
- Linke Ouyang 1
- Siyi Qian 1
- Yuan Qu 1
- Zhifei Ren 1
- Shenguanlin 1
- Yuefeng Sun 1
- Zirui Tang 1
- Feng Tao 1
- Jingqi Tong 1
- Zhongying Tu 1
- Bin Wang 1
- Fangdong Wang 1
- Guangyu Wang 1
- Shasha Wang 1
- Yikun Wang 1
- Liqun Wei 1
- Fan Wu 1
- Jiang Wu 1
- Lijun Wu 1
- Qianqian Wu 1
- Chao Xu 1
- RuiLiang Xu 1
- Yuhang Zang 1
- Bo Zhang 1
- Junyuan Zhang 1
- Linfeng Zhang 1
- Qintong Zhang 1
- Rui Zhang 1
- Wentao Zhang 1
- Wenzheng Zhang 1
- Yin Zhang 1
- Xiaomeng Zhao 1
- Zhiyuan Zhao 1
- Yuanhong Zheng 1
- Bowen Zhou 1
- Xuanhe Zhou 1
Venues
- ACL2