Dehui Du
2025
Reversal of Thought: Enhancing Large Language Models with Preference-Guided Reverse Reasoning Warm-up
Jiahao Yuan
|
Dehui Du
|
Hao Zhang
|
Zixiang Di
|
Usman Naseem
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language models (LLMs) have shown remarkable performance in reasoning tasks but face limitations in mathematical and complex logical reasoning. Existing methods to improve LLMs’ logical capabilities either involve traceable or verifiable logical sequences that generate more reliable responses by constructing logical structures yet increase computational costs, or introduces rigid logic template rules, reducing flexibility. In this paper, we propose Reversal of Thought (RoT), a plug-and-play and cost-effective reasoning framework designed to enhance the logical reasoning abilities of LLMs during the warm-up phase prior to batch inference. RoT utilizes a Preference-Guided Reverse Reasoning warm-up strategy, which integrates logical symbols for pseudocode planning through meta-cognitive mechanisms and pairwise preference self-evaluation to generate task-specific prompts solely through demonstrations, aligning with LLMs’ cognitive preferences shaped by RLHF. Through reverse reasoning, we utilize a Cognitive Preference Manager to assess knowledge boundaries and further expand LLMs’ reasoning capabilities by aggregating solution logic for known tasks and stylistic templates for unknown tasks. Experiments across various tasks demonstrate that RoT surpasses existing baselines in both reasoning accuracy and efficiency.
LLMSR@XLLM25: Less is More: Enhancing Structured Multi-Agent Reasoning via Quality-Guided Distillation
Jiahao Yuan
|
Xingzhe Sun
|
Xing Yu
|
Jingwen Wang
|
Dehui Du
|
Zhiqing Cui
|
Zixiang Di
Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025)
The LLMSR@XLLM25 formulates a low-resource structural reasoning task that challenges LLMs to generate interpretable, step-by-step rationales with minimal labeled data. We present Less is More, the third-place winning approach in the LLMSR@XLLM25, which focuses on structured reasoning from only 24 labeled examples. Our approach leverages a multi-agent framework with reverse-prompt induction, retrieval-augmented reasoning synthesis via GPT-4o, and dual-stage reward-guided filtering to distill high-quality supervision across three subtasks: question parsing, CoT parsing, and step-level verification. All modules are fine-tuned from Meta-Llama-3-8B-Instruct under a unified LoRA+ setup. By combining structure validation with reward filtering across few-shot and zero-shot prompts, our pipeline consistently improves structure reasoning quality. These results underscore the value of controllable data distillation in enhancing structured inference under low-resource constraints. Our code is available at https://github.com/JhCircle/Less-is-More.
Search
Fix author
Co-authors
- Zixiang Di 2
- Jiahao Yuan 2
- Zhiqing Cui 1
- Usman Naseem 1
- Xingzhe Sun 1
- show all...