Wang Xi

2026

Chain-of-thought (CoT) often improves multi-step reasoning, but it remains unclear what kind of additional sequential computation longer traces actually enable. We connect CoT to Bennett’s logical depth, separating an answer’s description length from the sequential effort required to derive it, and view a CoT budget of T steps as a qualitative cap on realizable sequential computation. To operationalize realized depth beyond raw length, we introduce Effective Logical Depth (ELD), a deletion-based measure of step necessity under a specified inference interface. Across depth-controlled prefix-sum tasks and GSM8K rationale perturbations, we observe two consistent signatures of a Time-for-Accuracy tradeoff: (i) plateau-to-transition accuracy curves as the budget increases from being below to matching the task’s required depth, and (ii) sparse, position-dependent deletion sensitivity concentrated in early steps for deeper instances. On GSM8K, an Extract interface, where the model reads off the answer from the remaining rationale, remains near-perfect even after prefix deletions, whereas a Repair interface, where the model must re-solve from truncated rationale context, degrades markedly. Moreover, Socratic human rationales are consistently more robust than Main rationales under Repair. These results suggest that longer CoT helps primarily when it enables additional effective sequential computation, and that deletion-based diagnostics can distinguish computational steps from redundant ones.

2024

pdf bib abs

Retrieval-Augmented Large Language Models(RALMs) have made significant strides in enhancing the accuracy of generated responses. However, existing research often overlooks the data quality issues within retrieval results, often caused by inaccurate existing vector-distance-based retrieval methods. We propose to boost the precision of RALMs’ answers from a data quality perspective through the Context-Driven Index Trimming (CDIT) framework, where Context Matching Dependencies (CMDs) are employed as logical data quality rules to capture and regulate the consistency between retrieved contexts. Based on the semantic comprehension capabilities of Large Language Models (LLMs), CDIT can effectively identify and discard retrieval results that are inconsistent with the query context and further modify indexes in the database, thereby improving answer quality. Experiments demonstrate average improvement of 3.75% in accuracy on challenging open-domain question-answering tasks. Also, the flexibility of CDIT is verified through its compatibility with various language models and indexing methods, which offers a promising approach to bolster RALMs’ data quality and retrieval precision jointly.

Co-authors

Venues

Findings2

Fix author