Haoran Zhao
2026
Attribution-Based Analysis and Optimization of Modular Agentic Workflows
Yingxuan Yang | Bo Huang | Siyuan Qi | Chao Feng | Haoyi Hu | Yuxuan Zhu | Jinbo Hu | Haoran Zhao | Ziyi He | Xiao Liu | ZongYu Wang | Muning Wen | Lin Qiu | Xuezhi Cao | Xunliang Cai | Yong Yu | Weinan Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Yingxuan Yang | Bo Huang | Siyuan Qi | Chao Feng | Haoyi Hu | Yuxuan Zhu | Jinbo Hu | Haoran Zhao | Ziyi He | Xiao Liu | ZongYu Wang | Muning Wen | Lin Qiu | Xuezhi Cao | Xunliang Cai | Yong Yu | Weinan Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Agentic workflows solve complex tasks by orchestrating modular components (e.g., planning, reasoning, action, reflection) built on top of LLM backbones. A practical but underexplored question is model allocation: given a fixed workflow decomposition and a pool of candidate LLMs, which components should be upgraded (and with which models) to upgrade task performance, and how can we attribute gains to individual upgrades and their interactions?We present ShapleyFlow, a cooperative game theoretic framework that models component upgrades as players and evaluates component coalitions to compute Shapley values. This yields interaction-aware attribution and supports Shapley-guided configuration recommendation for model allocation under a fixed workflow structure.We further introduce CapaBench, a benchmark of 1,500+ tasks across seven domains (shopping, navigation, ticketing, mathematics, operating systems, robotic coordination, and automated theorem proving).Across 9 representative LLMs and all 24 upgrade coalitions in a 4-component workflow, ShapleyFlow provides (i) principled, interaction-aware attribution for modular workflows and (ii) actionable model-allocation recommendations that improve over strong single-model baselines.
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code
Keke Lian | Wang Bin | Lei Zhang | Libo Chen | Junjie Wang | Ziming Zhao | Yujiu Yang | Miaoqian Lin | Haotong Duan | Haoran Zhao | Shuang Liao | Mingda Guo | Quan Jiazheng | Yilu Zhong | Chenhao He | Chen Zichuan | Jie Wu | Haoling Li | Zhaoxuan Li | Jiongchi Yu | Hui LI | Dong Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Keke Lian | Wang Bin | Lei Zhang | Libo Chen | Junjie Wang | Ziming Zhao | Yujiu Yang | Miaoqian Lin | Haotong Duan | Haoran Zhao | Shuang Liao | Mingda Guo | Quan Jiazheng | Yilu Zhong | Chenhao He | Chen Zichuan | Jie Wu | Haoling Li | Zhaoxuan Li | Jiongchi Yu | Hui LI | Dong Zhang
Findings of the Association for Computational Linguistics: ACL 2026
The increasing adoption of large language models (LLMs) in software engineering necessitates rigorous security evaluation of their generated code. However, existing benchmarks often lack relevance to real-world AI-assisted programming scenarios, making them inadequate for assessing the practical security risks associated with AI-generated code in production environments. To address this gap, we introduce A.S.E (AI Code Generation Security Evaluation), a repository-level evaluation benchmark designed to closely mirror real-world AI programming tasks, offering a comprehensive and reliable framework for assessing the security of AI-generated code. Our evaluation of leading LLMs on A.S.E reveals several key findings. In particular, current LLMs still struggle with secure coding. The complexity in repository-level scenarios presents challenges for LLMs that typically perform well on snippet-level tasks. Moreover, a larger reasoning budget does not necessarily lead to better code generation. These observations offer valuable insights into the current state of AI code generation and help developers identify the most suitable models for practical tasks. They also lay the groundwork for refining LLMs to generate secure and efficient code in real-world applications.
2025
Comparing human and LLM politeness strategies in free production
Haoran Zhao | Robert D. Hawkins
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Haoran Zhao | Robert D. Hawkins
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Polite speech poses a fundamental alignment challenge for large language models (LLMs). Humans deploy a rich repertoire of linguistic strategies to balance informational and social goals – from positive approaches that build rapport (compliments, expressions of interest) to negative strategies that minimize imposition (hedging, indirectness).We investigate whether LLMs employ a similarly context-sensitive repertoire by comparing human and LLM responses to English-language scenarios in both constrained and open-ended production tasks.We find that larger models (≥70B parameters) successfully replicate key effects from the computational pragmatics literature, and human evaluators prefer LLM-generated responses in open-ended contexts. However, further linguistic analyses reveal that models disproportionately rely on negative politeness strategies to create distance even in positive contexts, potentially leading to misinterpretations. While LLMs thus demonstrate an impressive command of politeness strategies, these systematic differences provide important groundwork for making intentional choices about pragmatic behavior in human-AI communication.
Search
Fix author
Co-authors
- Wang Bin 1
- Xunliang Cai 1
- Xuezhi Cao 1
- Libo Chen 1
- Haotong Duan 1
- Chao Feng 1
- Mingda Guo 1
- Robert D. Hawkins 1
- Ziyi He 1
- Chenhao He 1
- Haoyi Hu 1
- Jinbo Hu 1
- Bo Huang 1
- Quan Jiazheng 1
- Hui LI 1
- Haoling Li 1
- Zhaoxuan Li 1
- Keke Lian 1
- Shuang Liao 1
- Miaoqian Lin 1
- Xiao Liu 1
- Siyuan Qi 1
- Lin Qiu 1
- ZongYu Wang 1
- Junjie Wang 1
- Muning Wen 1
- Jie Wu 1
- Yingxuan Yang 1
- Yujiu Yang 1
- Yong Yu 1
- Jiongchi Yu 1
- Weinan Zhang 1
- Lei Zhang 1
- Dong Zhang 1
- Ziming Zhao 1
- Yilu Zhong 1
- Yuxuan Zhu 1
- Chen Zichuan 1