Dong Wang

Other people with similar names: Dong Wang, Dong Wang

Unverified author pages with similar names: Dong Wang

2025

Large Reasoning Models (LRMs) perform strongly in complex reasoning tasks via Chain-of-Thought (CoT) prompting, but often suffer from verbose outputs, increasing computational overhead. Existing fine-tuning-based compression methods either operate post-hoc pruning, risking disruption to reasoning coherence, or rely on sampling-based selection, which fails to remove redundant content thoroughly. To address these limitations, this work begins by framing two key patterns of redundant reflection in LRMs—Confidence Deficit, wherein the model reflects on correct intermediate steps, and Termination Delay, where reflection continues after a verified, confident answer—through a confidence-guided perspective. Based on this, we introduce ConCISE (Confidence-guided Compression In Step-by-step Efficient Reasoning), a framework designed to generate concise reasoning chains, integrating Confidence Injection to boost reasoning confidence, and Early Stopping to terminate reasoning when confidence is sufficient. Extensive experiments demonstrate that compared to baseline methods, fine-tuning LRMs on ConCISE-generated data yields a better balance between compression and task performance, reducing length by up to ～50% under SimPO, while maintaining high task accuracy.

We introduce ProcWorld, a large-scale benchmark for partially observable embodied spatial reasoning and long-term planning with large language models (LLM) and vision language models (VLM). ProcWorld features a wide range of challenging embodied navigation and object manipulation tasks, covering 16 task types, 5,000 rooms, and over 10 million evaluation trajectories with diverse data distribution. ProcWorld supports configurable observation modes, ranging from text-only descriptions to vision-only observations. It enables text-based actions to control the agent following language instructions. ProcWorld has presented significant challenges for LLMs and VLMs: (1) active information gathering given partial observations for disambiguation; (2) simultaneous localization and decision-making by tracking the spatio-temporal state-action distribution; (3) constrained reasoning with dynamic states subject to physical reachability. Our extensive evaluation of 15 foundation models and 5 reasoning algorithms (with over 1 million rollouts) indicates larger models perform better. However, ProcWorld remains highly challenging for existing state-of-the-art models and in-context learning methods due to constrained reachability and the need for combinatorial spatial reasoning.

Co-authors

Xiao Ma 1

Ju Ren 1

Lai Wei 1

Venues

emnlp2

Fix author