Chenyu Zhu
2026
I2E: From Image Pixels to Actionable Interactive Environments for Text-Guided Image Editing
Jinghan Yu | Junhao Xiao | Chenyu Zhu | Jiaming Li | Jia Li | HanMing Deng | Xirui Wang | Guoli Jia | Jianjun Li | Xiang Bai | Bowen Zhou | Zhiyuan Ma
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jinghan Yu | Junhao Xiao | Chenyu Zhu | Jiaming Li | Jia Li | HanMing Deng | Xirui Wang | Guoli Jia | Jianjun Li | Xiang Bai | Bowen Zhou | Zhiyuan Ma
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Existing text-guided image editing methods primarily rely on end-to-end pixel-level inpainting paradigm. Despite its success in simple scenarios, this paradigm still significantly struggles with compositional editing tasks that require precise local control and complex multi-object spatial reasoning. This paradigm is severely limited by 1) the implicit coupling of planning and execution, 2) the lack of object-level control granularity, and 3) the reliance on unstructured, pixel-centric modeling. To address these limitations, we propose I2E, a novel "Decompose-then-Action” paradigm that revisits image editing as an actionable interaction process within a structured environment. I2E utilizes a Decomposer to transform unstructured images into discrete, manipulable object layers and then introduces a physics-aware Vision-Language-Action Agent to parse complex instructions into a series of atomic actions via Chain-of-Thought reasoning. Further, we also construct I2E-Bench, a benchmark designed for multi-instance spatial reasoning and high-precision editing. Experimental results on I2E-Bench and multiple public benchmarks demonstrate that I2E significantly outperforms state-of-the-art methods in handling complex compositional instructions, maintaining physical plausibility, and ensuring multi-turn editing stability.
DetectRL-X: Towards Reliable Multilingual and Real-World LLM-Generated Text Detection
Junchao Wu | Yefeng Liu | Chenyu Zhu | Hao Zhang | Zeyu Wu | Tianqi Shi | Yichao Du | Longyue Wang | Weihua Luo | Jinsong Su | Derek F. Wong
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Junchao Wu | Yefeng Liu | Chenyu Zhu | Hao Zhang | Zeyu Wu | Tianqi Shi | Yichao Du | Longyue Wang | Weihua Luo | Jinsong Su | Derek F. Wong
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
The effective detection and governance of Large Language Model (LLM) generated content has become increasingly critical due to the growing risk of misuse. Despite the impressive performance of existing detectors, their reliability and potential in multilingual, real-world scenarios remain largely underexplored.In this study, we introduce DetectRL-X, a comprehensive multilingual benchmark designed to evaluate advanced detectors across 8 dimensions. The benchmark encompasses 8 languages commonly used in commercial contexts and collects human-written texts from 6 domains highly susceptible to LLM misuse. To better aligned with real-world applications, We create LLM-generated texts using 4 popular commercial LLMs, and include typical AI-assisted writing operations such as polishing, expanding, and condensing to capture authentic usage patterns. Furthermore, we develop a multilingual framework for paraphrasing and perturbation attacks to simulate diverse human modifications and writing noise, enabling stress testing of detectors across languages.Experimental results on DetectRL-X reveal the strengths and limitations of current state-of-the-art detectors when applied to diverse linguistic resources. We further analyze how domains, generators, attack strategies, text length, and refinement operations influence performance in different languages, underscoring DetectRL-X as an effective benchmark for strengthening multilingual and language-specific detectors.
2025
Marco-Bench-MIF: On Multilingual Instruction-Following Capability of Large Language
Bo Zeng | Chenyang Lyu | Sinuo Liu | Mingyan Zeng | Minghao Wu | Xuanfan Ni | Tianqi Shi | Yu Zhao | Yefeng Liu | Chenyu Zhu | Ruizhe Li | Jiahui Geng | Qing Li | Yu Tong | Longyue Wang | Weihua Luo | Kaifu Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Bo Zeng | Chenyang Lyu | Sinuo Liu | Mingyan Zeng | Minghao Wu | Xuanfan Ni | Tianqi Shi | Yu Zhao | Yefeng Liu | Chenyu Zhu | Ruizhe Li | Jiahui Geng | Qing Li | Yu Tong | Longyue Wang | Weihua Luo | Kaifu Zhang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Instruction-following capability has become a major ability to be evaluated for Large Language Models. However, existing datasets, such as IFEval, are either predominantly monolingual and centered on English or simply machine translated to other languages, limiting their applicability in multilingual contexts. In this paper, we present an carefully-curated extension of IFEval to a localized multilingual version named Marco-Bench-MIF, covering 30 languages with varying levels of localization. Our benchmark addresses linguistic constraints (e.g., modifying capitalization requirements for Chinese) and cultural references (e.g., substituting region-specific company names in prompts) via a hybrid pipeline combining translation with verification. Through comprehensive evaluation of 20+ LLMs on our Marco-Bench-MIF, we found that: (1) 25-35% accuracy gap between high/low-resource languages, (2) model scales largely impact performance by 45-60% yet persists script-specific challenges, and (3) machine-translated data underestimates accuracy by 7-22% versus localized data. Our analysis identifies challenges in multilingual instruction following, including keyword consistency preservation and compositional constraint adherence across languages. Our Marco-Bench-MIF will be made publicly available to the community.
Search
Fix author
Co-authors
- Yefeng Liu 2
- Weihua Luo 2
- Tianqi Shi 2
- Longyue Wang 2
- Xiang Bai 1
- HanMing Deng 1
- Yichao Du 1
- Jiahui Geng 1
- Guoli Jia 1
- Jia Li 1
- Jiaming Li 1
- Jianjun Li 1
- Qing Li 1
- Ruizhe Li 1
- Sinuo Liu 1
- Chenyang Lyu 1
- Zhiyuan Ma 1
- Xuanfan Ni 1
- Jinsong Su 1
- Yu Tong 1
- Xirui Wang 1
- Derek F. Wong (黄辉) 1
- Junchao Wu 1
- Minghao Wu 1
- Zeyu Wu 1
- Junhao Xiao 1
- Jinghan Yu 1
- Bo Zeng 1
- Mingyan Zeng 1
- Hao Zhang 1
- Kaifu Zhang 1
- Yu Zhao 1
- Bowen Zhou 1
Venues
- ACL3