Xiangfeng Wang
2026
Evolving Agentic Workflow Driven by Human-Agent Collaboration
Yuxin Liu | Jinxuan Zhang | Yuezhang Peng | Hefeng Zhou | Xiangfeng Wang | Jiong Lou | Chentao Wu | Jie LI | Jingjing Qu | Chaochao Lu
Findings of the Association for Computational Linguistics: ACL 2026
Yuxin Liu | Jinxuan Zhang | Yuezhang Peng | Hefeng Zhou | Xiangfeng Wang | Jiong Lou | Chentao Wu | Jie LI | Jingjing Qu | Chaochao Lu
Findings of the Association for Computational Linguistics: ACL 2026
Agentic workflows, composed of multiple collaborating Large Language Models (LLMs), have become a key paradigm for complex problem-solving. However, their effectiveness is often hindered by three critical challenges: high manual design costs, inefficient agentic search, and poor dynamic adaptability to new tasks and human preferences. To address these limitations, we propose HFlow, an evolutionary framework for generating agentic workflows through human-agent collaboration. HFlow employs an evolutionary algorithm to automate the search for optimal workflows by mutating and crossing over their structures, prompts, and LLM backbones. This process is guided by human preferences to ensure rapid convergence, while a hierarchical experience memory enables the generalization of learned strategies. Extensive experiments on math and code generation benchmarks show HFlow surpasses other automated baselines by up to 27.34%, while achieving comparable performance to o1-preview at only one-fourth of the cost. Our work introduces a new paradigm for workflow design that produces cost-effective and adaptive solutions, better aligning automated agentic systems with dynamic human needs.
PRIME: A Process-Outcome Alignment Benchmark for Verifiable Reasoning in Mathematics and Engineering
Xiangfeng Wang | Hangyu Guo | Yanlin Lai | Mitt Huang | Liang Zhao | Chengyuan Yao | Yinmin Zhang | Qi Han | Xiaoxiaoren | Chun Yuan | Tong Xu | Zheng Ge | Xiangyu Zhang | Daxin Jiang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Xiangfeng Wang | Hangyu Guo | Yanlin Lai | Mitt Huang | Liang Zhao | Chengyuan Yao | Yinmin Zhang | Qi Han | Xiaoxiaoren | Chun Yuan | Tong Xu | Zheng Ge | Xiangyu Zhang | Daxin Jiang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
While model-based verifiers are essential for scaling Reinforcement Learning with Verifiable Rewards (RLVR), current outcome-centric verification paradigms primarily focus on the consistency between the final result and the ground truth, often neglecting potential errors in the derivation process. This leads to assigning positive rewards to correct answers produced from incorrect derivations. To bridge this gap, we introduce **PRIME**, a benchmark for evaluating verifiers on **PR**ocess-outcome alignment verification **I**n **M**athematics and **E**ngineering. Curated from a comprehensive collection of college-level STEM problems, **PRIME** comprises 2,530 high-difficulty samples through a consistency-based filtering pipeline. Through extensive evaluation, we find that current verifiers frequently fail to detect derivation flaws. Furthermore, we propose a process-aware RLVR training paradigm utilizing verifiers selected via **PRIME**. This approach substantially outperforms the outcome-only verification baseline, achieving absolute performance gains of **8.29%**, **9.12%**, and **7.31%** on AIME24, AIME25, and Beyond-AIME, respectively, for the Qwen3-14B-Base model. Finally, we demonstrate a strong linear correlation (R2 > 0.92) between verifier accuracy on **PRIME** and RLVR training effectiveness, validating **PRIME** as a reliable predictor for verifier selection.
Agentic Episodic Control
Xidong Yang | Wenhao Li | Junjie Sheng | Yun Hua | Haosheng Chen | Chuyun Shen | Xiangfeng Wang
Findings of the Association for Computational Linguistics: ACL 2026
Xidong Yang | Wenhao Li | Junjie Sheng | Yun Hua | Haosheng Chen | Chuyun Shen | Xiangfeng Wang
Findings of the Association for Computational Linguistics: ACL 2026
Reinforcement learning (RL) remains fundamentally limited by poor data efficiency and weak generalization. Prior episodic RL methods attempt to alleviate this via external memory modules, yet they suffer from two key limitations: a representation bottleneck caused by shallow encoders, and a retrieval dilemma where episodic memory is accessed indiscriminately.To address these challenges, we propose Agentic Episodic Control (AEC), a novel architecture that integrates large language models (LLMs) into episodic RL.AEC uses an LLM-based semantic augmenter to generate semantic representations from raw observations, and a critical state recognizer to selectively retrieve valuable experiences.This transforms memory usage from passive similarity matching into strategic, context-aware recall.Across five BabyAI-Text environments, AEC achieves 2–6× higher data efficiency than baselines and is the only method to solve complex tasks like UnlockLocal with over 90% success.It further demonstrates strong cross-task and cross-environment generalization, maintaining performance even under distribution shifts.AEC shows that combining LLM-derived priors with reinforcement learning yields more sample-efficient and adaptable agents. Code is available at https://github.com/Xidong-Yang/Agentic_Episodic_Control.
2025
From Long Videos to Engaging Clips: A Human-Inspired Video Editing Framework with Multimodal Narrative Understanding
Xiangfeng Wang | Xiao Li | Yadong Wei | Songxueyu | Yang Song | Xiaxiaoqiang | Fangrui Zeng | Zaiyi Chen | Liuliu | Gu Xu | Tong Xu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Xiangfeng Wang | Xiao Li | Yadong Wei | Songxueyu | Yang Song | Xiaxiaoqiang | Fangrui Zeng | Zaiyi Chen | Liuliu | Gu Xu | Tong Xu
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
The rapid growth of online video content, especially on short video platforms, has created a growing demand for efficient video editing techniques that can condense long-form videos into concise and engaging clips. Existing automatic editing methods predominantly rely on textual cues from ASR transcripts and end-to-end segment selection, often neglecting the rich visual context and leading to incoherent outputs. In this paper, we propose a Human-Inspired automatic video editing framework (HIVE) that leverages multimodal narrative understanding to address these limitations. Our approach incorporates character extraction, dialogue analysis, and narrative summarization through multimodal large language models, enabling a holistic understanding of the video content. To further enhance coherence, we apply scene-level segmentation and decompose the editing process into three subtasks: highlight detection, opening/ending selection, and pruning of irrelevant content. To facilitate research in this area, we introduce DramaAD, a novel benchmark dataset comprising over 2500 short drama episodes and 500 professionally edited advertisement clips. Experimental results demonstrate that our framework consistently outperforms existing baselines across both general and advertisement-oriented editing tasks, significantly narrowing the quality gap between automatic and human-edited videos.
2024
In-Context Former: Lightning-fast Compressing Context for Large Language Model
Xiangfeng Wang | Zaiyi Chen | Tong Xu | Zheyong Xie | Yongyi He | Enhong Chen
Findings of the Association for Computational Linguistics: EMNLP 2024
Xiangfeng Wang | Zaiyi Chen | Tong Xu | Zheyong Xie | Yongyi He | Enhong Chen
Findings of the Association for Computational Linguistics: EMNLP 2024
With the rising popularity of Transformer-based large language models (LLMs), reducing their high inference costs has become a significant research focus. One effective approach to mitigate these costs is compressing the long input contexts. Existing methods typically leverage the self-attention mechanism of the large model itself for context compression. While these methods have achieved notable results, the compression process still entails quadratic complexity. To mitigate this limitation, we propose the In-Context Former (IC-Former). This method does not rely on the target large model but instead utilizes cross-attention mechanisms to extract and condense information from the contextual embeddings. The computational overhead of our method grows linearly with the compression range. Experimental results indicate that our method requires only 1/32 of the floating-point operations of the baseline during compression and improves processing speed by 68 to 112 times while achieving 90% of the baseline performance on evaluation metrics. Additionally, IC-Former demonstrates strong regularity in its interactions with the context, enhancing its interpretability. Overall, IC-Former significantly reduces compression costs, making real-time compression scenarios feasible.
Search
Fix author
Co-authors
- Tong Xu 3
- Zaiyi Chen 2
- Haosheng Chen 1
- Enhong Chen 1
- Zheng Ge 1
- Hangyu Guo 1
- Qi Han 1
- Yongyi He 1
- Yun Hua 1
- Mitt Huang 1
- Daxin Jiang 1
- Jie LI 1
- Yanlin Lai 1
- Wenhao Li 1
- Xiao Li 1
- Yuxin Liu 1
- Liuliu 1
- Jiong Lou 1
- Chaochao Lu 1
- Yuezhang Peng 1
- Jingjing Qu 1
- Chuyun Shen 1
- Junjie Sheng 1
- Yang Song 1
- Songxueyu 1
- Yadong Wei 1
- Chentao Wu 1
- Xiaoxiaoren 1
- Xiaxiaoqiang 1
- Zheyong Xie 1
- Gu Xu 1
- Xidong Yang 1
- Chengyuan Yao 1
- Chun Yuan 1
- Fangrui Zeng 1
- Jinxuan Zhang 1
- Yinmin Zhang 1
- Xiangyu Zhang 1
- Liang Zhao (赵亮) 1
- Hefeng Zhou 1