Yue Cui
2026
Branch-and-Browse: Efficient and Controllable Web Exploration with Tree-Structured Reasoning and Action Memory
Shiqi He | Yue Cui | Xinyu Ma | Yaliang Li | Bolin Ding | Mosharaf Chowdhury
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Shiqi He | Yue Cui | Xinyu Ma | Yaliang Li | Bolin Ding | Mosharaf Chowdhury
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Autonomous web agents powered by large language models (LLMs) show strong potential for performing goal-oriented tasks such as information retrieval, report generation, and online transactions. These agents mark a key step toward practical embodied reasoning in open web environments. However, existing approaches remain limited in reasoning depth and efficiency: vanilla linear methods fail at multi-step reasoning and lack effective backtracking, while other search strategies are coarse-grained and computationally costly. We introduce Branch-and-Browse, a fine-grained web agent framework that unifies structured reasoning-acting, contextual memory, and efficient execution. It (i) employs explicit subtask management with tree-structured exploration for controllable multi-branch reasoning, (ii) bootstraps exploration through efficient web state replay with background reasoning, and (iii) leverages a page action memory to share explored actions within and across sessions. On the WebArena benchmark, Branch-and-Browse achieves a task success rate of 35.8% and reduces execution time by up to 40.4% relative to state-of-the-art methods. These results demonstrate that Branch-and-Browse is a reliable and efficient framework for LLM-based web agents. Code is available at https://anonymous.4open.science/r/Branch_and_Browse/.
RSDA: Restoring Stale Data Affinity via Dynamic Renovation Strategy for Mitigating Data Scarcity
Yidan Liang | Jia Zhu | Weijie Shi | Hanghui Guo | Yue Cui | Jiawei Shen | Guoqing Ma | Jingjiang Liu | Qingyu Niu | Yilin Wang | Shimin Di | Jiajie Xu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yidan Liang | Jia Zhu | Weijie Shi | Hanghui Guo | Yue Cui | Jiawei Shen | Guoqing Ma | Jingjiang Liu | Qingyu Niu | Yilin Wang | Shimin Di | Jiajie Xu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
High-quality data is the cornerstone of advancing large language models. However, the field currently faces a critical dilemma: the supply of premium data is nearing depletion, while vast stale corpora remain underutilized. Our empirical analysis reveals that training models on such data directly often leads to performance degradation. We attribute this phenomenon to the data affinity gap, a misalignment stemming from the model’s inability to effectively comprehend the data or inherent quality defects. To bridge this gap, we propose Restoring Stale Data Affinity (RSDA) framework. First, utilizing our proposed potential entropy metric, RSDA quantifies the latent value of samples to effectively identify stale data with higher renovation potential. Subsequently, the framework employs a dynamic renovation strategy selection mechanism to determine the optimal component-level strategy for each instance, transforming low-affinity stale samples into high-quality training data. Comprehensive experimental results demonstrate that RSDA effectively enhances data affinity, achieving performance improvements using less than 10% of the data volume, thereby underscoring that the latent potential of stale corpora remains largely untapped. The code is available at https://github.com/wenfiii/RSDA.
ACR: Adaptive Context Refactoring via Context Refactoring Operators for Multi-Turn Dialogue
Jiawei Shen | Jia Zhu | Hanghui Guo | Weijie Shi | Yue Cui | Qingyu Niu | Guoqing Ma | Jingjiang Liu | Yidan Liang | Yilin Wang | Shimin Di | Jiajie Xu
Findings of the Association for Computational Linguistics: ACL 2026
Jiawei Shen | Jia Zhu | Hanghui Guo | Weijie Shi | Yue Cui | Qingyu Niu | Guoqing Ma | Jingjiang Liu | Yidan Liang | Yilin Wang | Shimin Di | Jiajie Xu
Findings of the Association for Computational Linguistics: ACL 2026
Large Language Models (LLMs) have shown remarkable performance in multi-turn dialogue. However, in multi-turn dialogue, models still struggle to stay aligned with what has been established earlier, follow dependencies across many turns, and avoid drifting into incorrect facts as the interaction grows longer. Existing approaches primarily focus on extending the context window, introducing external memory, or applying context compression, yet these methods still face limitations such as contextual inertia and state drift. To address these challenges, we propose the Adaptive Context Refactoring (ACR) Framework, which dynamically monitors and reshapes the interaction history to mitigate contextual inertia and state drift actively. ACR is built on a library of context refactoring operators and a teacher-guided self-evolving training paradigm that learns when to intervene and how to refactor, thereby decoupling context management from the reasoning process. Extensive experiments on multi-turn dialogue demonstrate that our method significantly outperforms existing baselines while reducing token consumption. Our code is available at https://github.com/ClannadKno/multi-turn.
ReTRE: Benchmarking LLM Transfer Robustness with Structure-Preserving Variants
ZhongDong Li | Weijie Shi | Yue Cui | Haolun MA | Yuanjun Liu | Jiawei Li | An Liu | Jia Zhu | Jiajie Xu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
ZhongDong Li | Weijie Shi | Yue Cui | Haolun MA | Yuanjun Liu | Jiawei Li | An Liu | Jia Zhu | Jiajie Xu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language models (LLMs) have achieved strong performance on standard benchmarks, yet their performance is not robust across different task manifestations. It remains unclear how performance changes under controlled task rewrites that preserve the original solution structure, while varying the rewrite type and level. To address this question, we introduce ReTRE (Rewrite-based Transfer Robustness Evaluation), an evaluation benchmark inspired by learning transfer theory that probes transfer robustness along two rewrite levels: Near Transfer and Far Transfer. ReTRE employs a multi-agent system to construct textual and visual variants while preserving the structure of the original solution. Evaluations on mathematical and science tasks across state-of-the-art multimodal LLMs reveal a consistent transfer gap: performance exhibits a general declining trend as transfer similarity drops and strong text performance can face performance decline under cross-modal transfer. Crucially, we identify a divergence between post-training paradigms: reinforcement learning preserves transfer robustness, whereas supervised fine-tuning tends to overfit the training distribution, leading to severe degradation in far-transfer performance despite strong in-distribution accuracy.
KCVR: Knowledge-Centric Video Reconstruction for Structured Pedagogical Summarization via Dynamic Graph Planning
Jingjiang Liu | Jia Zhu | Hanghui Guo | Weijie Shi | Yue Cui | Xiaokang Jin | Yilin Wang | Qingyu Niu | Jiawei Shen | Guoqing Ma | Yidan Liang | Shimin Di | Jiajie Xu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jingjiang Liu | Jia Zhu | Hanghui Guo | Weijie Shi | Yue Cui | Xiaokang Jin | Yilin Wang | Qingyu Niu | Jiawei Shen | Guoqing Ma | Yidan Liang | Shimin Di | Jiajie Xu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Existing video summarization methods mainly compress content for gist browsing, but they often break the prerequisite logic in instructional videos and induce logical inversions (e.g., conclusions before premises). We formalize this problem as Structure-Pedagogical Reconstruction (SPR). SPR raises two challenges: (1) Structure Hallucination, where retrieved knowledge is topologically valid but not evidence-grounded by the blackboard; and (2) Logical Inversion, where soft prompt-level graph injection fails to enforce prerequisite order during decoding. To address these challenges, we propose Knowledge-Centric Video Reconstruction (KCVR), a Plan-then-Generate neuro-symbolic framework that decouples epistemic planning from content generation. KCVR prunes a Dual-Layer Epistemic Graph into a minimal video-supported plan, then realizes the plan with visually anchored attention and topology-constrained decoding. We additionally release EduStruct, a 10-discipline benchmark for SPR and structure-centric evaluation. Experiments show that KCVR outperforms strong end-to-end baselines on Knowledge Progression Consistency and Learning Objective Coverage. Our code and data are available at https://github.com/mark1001-ljj/video_sum.
2025
DIDS: Domain Impact-aware Data Sampling for Large Language Model Training
Weijie Shi | Jipeng Zhang | Yaguang Wu | Jingzhi Fang | Shibo Zhang | Yao Zhao | Hao Chen | Ruiyuan Zhang | Yue Cui | Jia Zhu | Sirui Han | Jiajie Xu | Xiaofang Zhou
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Weijie Shi | Jipeng Zhang | Yaguang Wu | Jingzhi Fang | Shibo Zhang | Yao Zhao | Hao Chen | Ruiyuan Zhang | Yue Cui | Jia Zhu | Sirui Han | Jiajie Xu | Xiaofang Zhou
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Large language models (LLMs) are commonly trained on multi-domain datasets, where domain sampling strategies significantly impact model performance due to varying domain importance across downstream tasks. Existing approaches for optimizing domain-level sampling strategies struggle with maintaining intra-domain consistency and accurately measuring domain impact. In this paper, we present Domain Impact-aware Data Sampling (DIDS). To ensure intra-domain consistency, a gradient clustering algorithm is proposed to group training data based on their learning effects, where a proxy language model and dimensionality reduction are employed to reduce computational overhead. To accurately measure domain impact, we develop a Fisher Information Matrix (FIM) guided metric that quantifies how domain-specific parameter updates affect the model’s output distributions on downstream tasks, with theoretical guarantees. Furthermore, to determine optimal sampling ratios, DIDS combines both the FIM-guided domain impact assessment and loss learning trajectories that indicate domain-specific potential, while accounting for diminishing marginal returns. Extensive experiments demonstrate that DIDS achieves 3.4% higher average performance while maintaining comparable training efficiency. The code is available at https://github.com/shiweijiezero/DIDS.
Enhancing Tool Learning in Large Language Models with Hierarchical Error Checklists
Yue Cui | Liuyi Yao | Shuchang Tao | Weijie Shi | Yaliang Li | Bolin Ding | Xiaofang Zhou
Findings of the Association for Computational Linguistics: ACL 2025
Yue Cui | Liuyi Yao | Shuchang Tao | Weijie Shi | Yaliang Li | Bolin Ding | Xiaofang Zhou
Findings of the Association for Computational Linguistics: ACL 2025
Large language models (LLMs) have significantly advanced natural language processing, particularly through the integration of external tools and APIs. However, their effectiveness is frequently hampered by parameter mis-filling during tool calling. In this paper, we propose the Hierarchical Tool Error Checklist (HiTEC) framework to systematically diagnose and mitigate tool-calling errors without relying on extensive real-world interactions. HiTEC introduces a two-tiered approach: a global error checklist that identifies common, cross-tool issues, and a local error checklist that targets tool-specific and contextual failures. Building on this structure, we propose two deployments: HiTEC-In Context Learning (HiTEC-ICL) and HiTEC-Kahneman-Tversky Optimization (HiTEC-KTO). HiTEC-ICL embeds the global checklist in the initial prompts and leverages a two-round conversational interaction to dynamically refine parameter handling, while HiTEC-KTO generates high-quality negative examples to drive fine-tuning via preference-based optimization. Extensive experiments across five public datasets demonstrate that our framework significantly improves parameter-filling accuracy and tool-calling success rates compared to baseline methods.
2022
CTAP for Chinese:A Linguistic Complexity Feature Automatic Calculation Platform
Yue Cui | Junhui Zhu | Liner Yang | Xuezhi Fang | Xiaobin Chen | Yujie Wang | Erhong Yang
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Yue Cui | Junhui Zhu | Liner Yang | Xuezhi Fang | Xiaobin Chen | Yujie Wang | Erhong Yang
Proceedings of the Thirteenth Language Resources and Evaluation Conference
The construct of linguistic complexity has been widely used in language learning research. Several text analysis tools have been created to automatically analyze linguistic complexity. However, the indexes supported by several existing Chinese text analysis tools are limited and different because of different research purposes. CTAP is an open-source linguistic complexity measurement extraction tool, which prompts any research purposes. Although it was originally developed for English, the Unstructured Information Management (UIMA) framework it used allows the integration of other languages. In this study, we integrated the Chinese component into CTAP, describing the index sets it incorporated and comparing it with three linguistic complexity tools for Chinese. The index set includes four levels of 196 linguistic complexity indexes: character level, word level, sentence level, and discourse level. So far, CTAP has implemented automatic calculation of complexity characteristics for four languages, aiming to help linguists without NLP background study language complexity.
Search
Fix author
Co-authors
- Weijie Shi 6
- Jiajie Xu 5
- Jia Zhu 5
- Shimin Di 3
- Hanghui Guo 3
- Yidan Liang 3
- Jingjiang Liu 3
- Guoqing Ma 3
- Qingyu Niu 3
- Jiawei Shen 3
- Yilin Wang 3
- Bolin Ding 2
- Yaliang Li 2
- Xiaofang Zhou 2
- Hao Chen 1
- Xiaobin Chen 1
- Mosharaf Chowdhury 1
- Jingzhi Fang 1
- Xuezhi Fang 1
- Sirui Han 1
- Shiqi He 1
- Xiaokang Jin 1
- ZhongDong Li 1
- Jiawei Li 1
- Yuanjun Liu 1
- An Liu 1
- Haolun MA 1
- Xinyu Ma 1
- Shuchang Tao 1
- Yujie Wang (誉杰 王) 1
- Yaguang Wu 1
- Liner Yang 1
- Erhong Yang 1
- Liuyi Yao 1
- Jipeng Zhang 1
- Shibo Zhang 1
- Ruiyuan Zhang 1
- Yao Zhao 1
- Junhui Zhu 1