Zixin Ding
2026
AEGIS: A Holistic Benchmark for Evaluating Forensic Analysis of AI-Generated Academic Images
Bo Zhang | Tzu-Yen Ma | Zichen Tang | Junpeng Ding | Zirui Wang | Yizhuo Zhao | Peilin Gao | Zijie Xi | Zixin Ding | Haiyang Sun | Haocheng Gao | Yuan Liu | Liangjia Wang | Yiling Huang | Yujie Wang | Yuyue Zhang | Ronghui Xi | Yuanze Li | Jiacheng Liu | Zhongjun Yang | Haihong E
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Bo Zhang | Tzu-Yen Ma | Zichen Tang | Junpeng Ding | Zirui Wang | Yizhuo Zhao | Peilin Gao | Zijie Xi | Zixin Ding | Haiyang Sun | Haocheng Gao | Yuan Liu | Liangjia Wang | Yiling Huang | Yujie Wang | Yuyue Zhang | Ronghui Xi | Yuanze Li | Jiacheng Liu | Zhongjun Yang | Haihong E
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
We introduce AEGIS, A holistic benchmark for Evaluating forensic analysis of AI-Generated academic ImageS. Compared to existing benchmarks, AEGIS features three key advances: (1) Domain-Specific Complexity: covering seven academic categories with 39 fine-grained subtypes, exposing intrinsic forensic difficulty, where even GPT-5.1 reaches 48.80% overall performance and expert models achieve only limited localization accuracy (IoU 30.09%); (2) Diverse Forgery Simulations: modeling four prevalent academic forgery strategies across 25 generative models, with 11 yielding average forensic accuracy below 50%, showing that forensics lag behind generative advances; and (3) Multi-Dimensional Forensic Evaluation: jointly assessing detection, reasoning, and localization, revealing complementary strengths between model families, with multimodal large language models (MLLMs) at 84.74% accuracy in textual artifact recognition and expert detectors peaking at 79.54% accuracy in binary authenticity detection. By evaluating 25 leading MLLMs, nine expert models, and one unified multimodal understanding and generation model, AEGIS serves as a diagnostic testbed exposing fundamental limitations in academic image forensics.
For-Value: Efficient Forward-Only Data Valuation for finetuning LLMs and VLMs
Wenlong Deng | Qi Zeng | Jiaming Zhang | Minghui Chen | Zixin Ding | Christos Thrampoulidis | Boying Gong | Xiaoxiao Li
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Wenlong Deng | Qi Zeng | Jiaming Zhang | Minghui Chen | Zixin Ding | Christos Thrampoulidis | Boying Gong | Xiaoxiao Li
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Data valuation is essential for enhancing the transparency and accountability of large language models (LLMs) and vision-language models (VLMs). However, existing methods typically rely on gradient computations, making them computationally prohibitive for billion-parameter models and precluding batch parallelization. In this work, we introduce For-Value, a forward-only data valuation framework that enables efficient batch-scalable value estimation while maintaining effectiveness. Leveraging the expressive power of pretrained LLMs/VLMs, we theoretically demonstrate that data valuation can be captured by the alignment between the final hidden representations and prediction errors at the last layer. In light of this insight, For-Value computes data value using a simple closed-form expression with a single forward pass, eliminating the need for costly backpropagation and enabling efficient batch calculating at scale. Extensive experiments show that For-Value matches or outperforms gradient-based baselines in detecting influential data and mislabeled data, while achieving significant efficiency improvements.
Automatic Prompt Engineering for Scalable Prompt Inversion in Text-to-Image Ad Generation
Zixin Ding | Qi Zeng | Boying Gong | Wenlong Deng | Bo Pan | Yuxin Chen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Zixin Ding | Qi Zeng | Boying Gong | Wenlong Deng | Bo Pan | Yuxin Chen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
While prompt engineering offers effective control over Text-to-Image (T2I) generation, it remains labor-intensive for large-scale production. We present PRISM-DUEL, a black-box framework that formalizes prompt optimization as Automatic Prompt Engineering (APE), motivated by advertising workflows requiring low-latency, diverse variants faithful to a human-designed ads. Since zero-shot LLMs are unreliable judges of image quality, PRISM-DUEL obtains label-free pairwise preferences and rationales from an LLM judge over pairs of generated images, then uses a dueling-bandit optimizer to optimize a prompt for generating controlled variations while matching the reference ad’s visual content. By iteratively steering the prompt distribution towards higher-quality generations and improving posterior calibration, PRISM-DUEL preserves visual similarity and semantic faithfulness while increasing diversity. Experiments on PartiPrompts and DreamBooth across Gemini 2.5 Flash Image, FLUX.1, and Qwen-Image show consistent gains over strong baselines in visual faithfulness and prompt interpretability.
Search
Fix author
Co-authors
- Wenlong Deng 2
- Boying Gong 2
- Qi Zeng 2
- Minghui Chen 1
- Yuxin Chen 1
- Junpeng Ding 1
- Haihong E 1
- Haocheng Gao 1
- Peilin Gao 1
- Yiling Huang 1
- Xiaoxiao Li 1
- Yuanze Li 1
- Jiacheng Liu 1
- Yuan Liu 1
- Tzu-Yen Ma 1
- Bo Pan 1
- Haiyang Sun 1
- Zichen Tang 1
- Christos Thrampoulidis 1
- Liangjia Wang 1
- Yujie Wang 1
- Zirui Wang 1
- Ronghui Xi 1
- Zijie Xi 1
- Zhongjun Yang 1
- Bo Zhang 1
- Jiaming Zhang 1
- Yuyue Zhang 1
- Yizhuo Zhao 1
Venues
- ACL3