Jiaming Zhang
2026
GUITester: Enabling GUI Agents for Exploratory Defect Discovery
Yifei Gao | Jiang Wu | Xiaoyi Chen | Yifan Yang | Zhe Cui | Tianyi Ma | Jiaming Zhang | Jitao Sang
Findings of the Association for Computational Linguistics: ACL 2026
Yifei Gao | Jiang Wu | Xiaoyi Chen | Yifan Yang | Zhe Cui | Tianyi Ma | Jiaming Zhang | Jitao Sang
Findings of the Association for Computational Linguistics: ACL 2026
Exploratory GUI testing is essential for software quality but suffers from high manual costs. While Multi-modal Large Language Model (MLLM) agents excel in navigation, they fail to autonomously discover defects due to two core challenges: Goal-Oriented Masking, where agents prioritize task completion over reporting anomalies, and Execution-Bias Attribution, where system defects are misidentified as agent errors. To address these, we first introduce GUITestBench, the first interactive benchmark for this task, featuring 143 tasks across 26 defects. We then propose GUITester, a multi-agent framework that decouples navigation from verification via two modules: (i) a Planning-Execution Module (PEM) that proactively probes for defects via embedded testing intents, and (ii) a Hierarchical Reflection Module (HRM) that resolves attribution ambiguity through interaction history analysis. GUITester achieves an F1-score of 48.90% (Pass@3) on GUITestBench, outperforming state-of-the-art baselines (33.35%). Our work demonstrates the feasibility of autonomous exploratory testing and provides a robust foundation for future GUI quality assurance.
For-Value: Efficient Forward-Only Data Valuation for finetuning LLMs and VLMs
Wenlong Deng | Qi Zeng | Jiaming Zhang | Minghui Chen | Zixin Ding | Christos Thrampoulidis | Boying Gong | Xiaoxiao Li
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Wenlong Deng | Qi Zeng | Jiaming Zhang | Minghui Chen | Zixin Ding | Christos Thrampoulidis | Boying Gong | Xiaoxiao Li
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Data valuation is essential for enhancing the transparency and accountability of large language models (LLMs) and vision-language models (VLMs). However, existing methods typically rely on gradient computations, making them computationally prohibitive for billion-parameter models and precluding batch parallelization. In this work, we introduce For-Value, a forward-only data valuation framework that enables efficient batch-scalable value estimation while maintaining effectiveness. Leveraging the expressive power of pretrained LLMs/VLMs, we theoretically demonstrate that data valuation can be captured by the alignment between the final hidden representations and prediction errors at the last layer. In light of this insight, For-Value computes data value using a simple closed-form expression with a single forward pass, eliminating the need for costly backpropagation and enabling efficient batch calculating at scale. Extensive experiments show that For-Value matches or outperforms gradient-based baselines in detecting influential data and mislabeled data, while achieving significant efficiency improvements.
DentalGPT: Incentivizing Multimodal Reasoning in Dentistry
Zhenyang Cai | Jiaming Zhang | Junjie Zhao | Ziyi Zeng | Yanchao Li | Liang Jingyi | Junying Chen | Yunjin Yang | Jiajun You | Shuzhi Deng | Xieruiqiii | Yuanting Chen | Xiangyi Feng | Jianquan Li | Liangyi Chen | Junwen Wang | Shan Jiang | Benyou Wang
Findings of the Association for Computational Linguistics: ACL 2026
Zhenyang Cai | Jiaming Zhang | Junjie Zhao | Ziyi Zeng | Yanchao Li | Liang Jingyi | Junying Chen | Yunjin Yang | Jiajun You | Shuzhi Deng | Xieruiqiii | Yuanting Chen | Xiangyi Feng | Jianquan Li | Liangyi Chen | Junwen Wang | Shan Jiang | Benyou Wang
Findings of the Association for Computational Linguistics: ACL 2026
Reliable interpretation of multimodal dental data is essential for automated oral healthcare, yet current multimodal large language models (MLLMs) show limited understanding of dental images. Although complex reasoning improves performance, its gains in dentistry are substantially smaller than in other medical domains, suggesting that complex reasoning is not yet sufficiently incentivized for dental diagnosis, likely due to insufficient domain knowledge and limited reinforcement learning on dental questions. We present DentalGPT, a dentistry-specialized MLLM trained via staged multimodal alignment and reinforcement learning. By constructing the largest annotated multimodal dental dataset to date with over 120k images, multimodal alignment provides the necessary domain knowledge foundation to support and incentivize complex reasoning, which is further strengthened through reinforcement learning. Experiments on expert-annotated benchmarks and dental subsets of medical VQA benchmarks show that DentalGPT achieves superior performance on disease classification and dental VQA tasks, outperforming many state-of-the-art MLLMs despite its compact 7B parameter scale.
2025
Investigating and Enhancing Vision-Audio Capability in Omnimodal Large Language Models
Rui Hu | Delai Qiu | Shuyu Wei | Jiaming Zhang | Yining Wang | Shengping Liu | Jitao Sang
Findings of the Association for Computational Linguistics: ACL 2025
Rui Hu | Delai Qiu | Shuyu Wei | Jiaming Zhang | Yining Wang | Shengping Liu | Jitao Sang
Findings of the Association for Computational Linguistics: ACL 2025
Omnimodal Large Language Models (OLLMs) have shown significant progress in integrating vision and text, but still struggle with integrating vision and audio, often exhibiting suboptimal performance when processing audio queries compared to text queries. This disparity is primarily due to insufficient alignment between vision and audio modalities during training, leading to inadequate attention to visual information when using audio queries. To mitigate this issue, we propose a Self-Knowledge Distillation (Self-KD) training method where the vision-text component of the OLLM serves as the teacher and the vision-audio component as the student. This enables the model to process audio in a manner analogous to its text processing. Our experimental results demonstrate that Self-KD is an effective method for enhancing the vision-audio capabilities of OLLMs by learning from the vision-text components, which subsequently improves the interaction between audio and images and results in improved performance on multimodal tasks.
Surge: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors
Bohan Lyu | Siqiao Huang | Zichen Liang | Qian Sun | Jiaming Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Bohan Lyu | Siqiao Huang | Zichen Liang | Qian Sun | Jiaming Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Neural surrogate models are powerful and efficient tools in data mining. Meanwhile, large language models (LLMs) have demonstrated remarkable capabilities in code-related tasks, such as generation and understanding. However, an equally important yet underexplored question is whether LLMs can serve as surrogate models for code execution prediction. To systematically investigate it, we introduce SURGE, a comprehensive benchmark with 1160 problems covering 8 key aspects: multi-language programming tasks, competition-level programming problems, repository-level code analysis, high-cost scientific computing, time-complexity-intensive algorithms, buggy code analysis, programs dependent on specific compilers or execution environments, and formal mathematical proof verification. Through extensive analysis of 21 open-source and proprietary LLMs, we examine scaling laws, data efficiency, and predictive accuracy. Our findings reveal important insights about the feasibility of LLMs as efficient surrogates for computational processes. The benchmark and evaluation framework are available at https://github.com/Imbernoulli/SURGE.
Search
Fix author
Co-authors
- Jitao Sang (桑基韬) 2
- Zhenyang Cai 1
- Xiaoyi Chen 1
- Minghui Chen 1
- Junying Chen 1
- Yuanting Chen 1
- Liangyi Chen 1
- Zhe Cui 1
- Wenlong Deng 1
- Shuzhi Deng 1
- Zixin Ding 1
- Xiangyi Feng 1
- Yifei Gao 1
- Boying Gong 1
- Rui Hu 1
- Siqiao Huang 1
- Shan Jiang 1
- Liang Jingyi 1
- Xiaoxiao Li 1
- Yanchao Li 1
- Jianquan Li 1
- Zichen Liang 1
- Shengping Liu 1
- Bohan Lyu 1
- Tianyi Ma 1
- Delai Qiu 1
- Qian Sun 1
- Christos Thrampoulidis 1
- Yining Wang 1
- Junwen Wang 1
- Benyou Wang 1
- Shuyu Wei 1
- Jiang Wu 1
- Xieruiqiii 1
- Yifan Yang 1
- Yunjin Yang 1
- Jiajun You 1
- Qi Zeng 1
- Ziyi Zeng 1
- Junjie Zhao 1