Shasha Wang
2026
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
Junbo Niu | Zheng Liu | Zhuangcheng Gu | Bin Wang | Linke Ouyang | Zhiyuan Zhao | Tao Chu | Tianyao He | Fan Wu | Qintong Zhang | Zhenjiang Jin | Guang Liang | Rui Zhang | Wenzheng Zhang | Yuan Qu | Zhifei Ren | Yuefeng Sun | Zirui Tang | Boyu Niu | Yuanhong Zheng | Dongsheng Ma | Ziyang Miao | Hejun Dong | Siyi Qian | Junyuan Zhang | Fangdong Wang | Jingzhou Chen | Xiaomeng Zhao | Liqun Wei | Wei Li | Shasha Wang | RuiLiang Xu | Yuanyuan Cao | Lu Chen | Qianqian Wu | Huaiyu Gu | Lindong Lu | Dechen Lin | Shenguanlin | Xuanhe Zhou | Linfeng Zhang | Yuhang Zang | Xiaoyi Dong | Jiaqi Wang | Bo Zhang | Lei Bai | Pei Chu | Weijia Li | Jiang Wu | Lijun Wu | Zhenxiang Li | Guangyu Wang | Zhongying Tu | Chao Xu | Kai Chen | Bowen Zhou | Dahua Lin | Wentao Zhang | Conghui He
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Junbo Niu | Zheng Liu | Zhuangcheng Gu | Bin Wang | Linke Ouyang | Zhiyuan Zhao | Tao Chu | Tianyao He | Fan Wu | Qintong Zhang | Zhenjiang Jin | Guang Liang | Rui Zhang | Wenzheng Zhang | Yuan Qu | Zhifei Ren | Yuefeng Sun | Zirui Tang | Boyu Niu | Yuanhong Zheng | Dongsheng Ma | Ziyang Miao | Hejun Dong | Siyi Qian | Junyuan Zhang | Fangdong Wang | Jingzhou Chen | Xiaomeng Zhao | Liqun Wei | Wei Li | Shasha Wang | RuiLiang Xu | Yuanyuan Cao | Lu Chen | Qianqian Wu | Huaiyu Gu | Lindong Lu | Dechen Lin | Shenguanlin | Xuanhe Zhou | Linfeng Zhang | Yuhang Zang | Xiaoyi Dong | Jiaqi Wang | Bo Zhang | Lei Bai | Pei Chu | Weijia Li | Jiang Wu | Lijun Wu | Zhenxiang Li | Guangyu Wang | Zhongying Tu | Chao Xu | Kai Chen | Bowen Zhou | Dahua Lin | Wentao Zhang | Conghui He
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
We introduce MinerU2.5, a 1.2B-parameter document parsing vision-language model that achieves state-of-the-art recognition accuracy while maintaining exceptional computational efficiency. Our approach employs a coarse-to-fine, two-stage parsing strategy that decouples global layout analysis from local content recognition. In the first stage, the model performs efficient layout analysis on downsampled images to identify structural elements, circumventing the computational overhead of processing high-resolution inputs. In the second stage, guided by the global layout, it performs targeted content recognition on native-resolution crops extracted from the original image, preserving fine-grained details in dense text, complex formulas, and tables. To support this strategy, we developed a comprehensive data engine that generates diverse, large-scale training corpora for both pretraining and fine-tuning. Ultimately, MinerU2.5 demonstrates strong document parsing ability, achieving state-of-the-art performance on multiple benchmarks, surpassing both general-purpose and domain-specific models across various recognition tasks, while maintaining significantly lower computational overhead.
2025
OpenHuEval: Evaluating Large Language Model on Hungarian Specifics
Haote Yang | Xingjian Wei | Jiang Wu | Noémi Ligeti-Nagy | Jiaxing Sun | Yinfan Wang | Zijian Győző Yang | Junyuan Gao | Jingchao Wang | Bowen Jiang | Shasha Wang | Nanjun Yu | Zihao Zhang | Shixin Hong | Hongwei Liu | Wei Li | Songyang Zhang | Dahua Lin | Lijun Wu | Gábor Prószéky | Conghui He
Findings of the Association for Computational Linguistics: ACL 2025
Haote Yang | Xingjian Wei | Jiang Wu | Noémi Ligeti-Nagy | Jiaxing Sun | Yinfan Wang | Zijian Győző Yang | Junyuan Gao | Jingchao Wang | Bowen Jiang | Shasha Wang | Nanjun Yu | Zihao Zhang | Shixin Hong | Hongwei Liu | Wei Li | Songyang Zhang | Dahua Lin | Lijun Wu | Gábor Prószéky | Conghui He
Findings of the Association for Computational Linguistics: ACL 2025
We introduce OpenHuEval, the first benchmark for LLMs focusing on the Hungarian language and specifics. OpenHuEval is constructed from a vast collection of Hungarian-specific materials sourced from multiple origins. In the construction, we incorporated the latest design principles for evaluating LLMs, such as using real user queries from the internet, emphasizing the assessment of LLMs’ generative capabilities, and employing LLM-as-judge to enhance the multidimensionality and accuracy of evaluations. Ultimately, OpenHuEval encompasses eight Hungarian-specific dimensions, featuring five tasks and 3953 questions. Consequently, OpenHuEval provides the comprehensive, in-depth, and scientifically accurate assessment of LLM performance in the context of the Hungarian language and its specifics. We evaluated current mainstream LLMs, including both traditional LLMs and recently developed Large Reasoning Models. The results demonstrate the significant necessity for evaluation and model optimization tailored to the Hungarian language and specifics. We also established the framework for analyzing the thinking processes of LRMs with OpenHuEval, revealing intrinsic patterns and mechanisms of these models in non-English languages, with Hungarian serving as a representative example. We will release OpenHuEval at https://github.com/opendatalab/OpenHuEval .
Search
Fix author
Co-authors
- Conghui He 2
- Wei Li 2
- Dahua Lin 2
- Jiang Wu 2
- Lijun Wu 2
- Lei Bai 1
- Yuanyuan Cao 1
- Jingzhou Chen 1
- Kai Chen 1
- Lu Chen 1
- Pei Chu 1
- Tao Chu 1
- Hejun Dong 1
- Xiaoyi Dong 1
- Junyuan Gao 1
- Huaiyu Gu 1
- Zhuangcheng Gu 1
- Tianyao He 1
- Shixin Hong 1
- Bowen Jiang 1
- Zhenjiang Jin 1
- Weijia Li 1
- Zhenxiang Li 1
- Guang Liang 1
- Noémi Ligeti-Nagy 1
- Dechen Lin 1
- Hongwei Liu 1
- Zheng Liu 1
- Lindong Lu 1
- Dongsheng Ma 1
- Ziyang Miao 1
- Boyu Niu 1
- Junbo Niu 1
- Linke Ouyang 1
- Gabor Proszeky 1
- Siyi Qian 1
- Yuan Qu 1
- Zhifei Ren 1
- Shenguanlin 1
- Jiaxing Sun 1
- Yuefeng Sun 1
- Zirui Tang 1
- Zhongying Tu 1
- Bin Wang 1
- Fangdong Wang 1
- Guangyu Wang 1
- Jiaqi Wang 1
- Jingchao Wang 1
- Yinfan Wang 1
- Liqun Wei 1
- Xingjian Wei 1
- Fan Wu 1
- Qianqian Wu 1
- Chao Xu 1
- RuiLiang Xu 1
- Haote Yang 1
- Zijian Győző Yang 1
- Nanjun Yu 1
- Yuhang Zang 1
- Bo Zhang 1
- Junyuan Zhang 1
- Linfeng Zhang 1
- Qintong Zhang 1
- Rui Zhang 1
- Songyang Zhang 1
- Wentao Zhang 1
- Wenzheng Zhang 1
- Zihao Zhang 1
- Xiaomeng Zhao 1
- Zhiyuan Zhao 1
- Yuanhong Zheng 1
- Bowen Zhou 1
- Xuanhe Zhou 1