Gang Yu
2025
Reason from Future: Reverse Thought Chain Enhances LLM Reasoning
Yinlong Xu
|
Yanzhao Zheng
|
Shuoshuo Sun
|
Shuaihan Huang
|
Baohua Dong
|
Zhu Hangcheng
|
Ruohui Huang
|
Gang Yu
|
Hongxia Xu
|
Jian Wu
Findings of the Association for Computational Linguistics: ACL 2025
It has been demonstrated that carefully designed reasoning paradigms, like Chain-of-Thought(CoT) and Tree-of-Thought(ToT), can enhance the reasoning capabilities of small language models by detailed thinking and extensive thought searching, unbounded branching factors in the searching space create prohibitive reasoning consumption. However these methods fell into the trap of local optimum reasoning, which means the model lacks a global perspective while solving problems. We propose a novel reasoning paradigm called Reason from Future(RFF), which generates reasoning paths by bidirectional reasoning that combines top-down planning with bottom-up reasoning accumulation. The essence of RFF lies in its reverse reasoning mechanism, which prioritizes core logical relationships and imposes goal-oriented constraints on intermediate steps, thereby reducing the searching space and mitigating error accumulation inherent in sequential forward reasoning. Empirical evaluations across diverse experiments demonstrate that RFF outperforms conventional paradigms with higher accuracy and less searching space to solve complex tasks.
2024
Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data
Yanda Li
|
Chi Zhang
|
Gang Yu
|
Wanqi Yang
|
Zhibin Wang
|
Bin Fu
|
Guosheng Lin
|
Chunhua Shen
|
Ling Chen
|
Yunchao Wei
Findings of the Association for Computational Linguistics: ACL 2024
The remarkable multimodal capabilities demonstrated by OpenAI’s GPT-4 have sparked significant interest in the development of multimodal Large Language Models (LLMs). A primary research objective of such models is to align visual and textual modalities effectively while comprehending human instructions.Current methodologies often rely on annotations derived from benchmark datasets to construct image-dialogue datasets for training purposes, akin to instruction tuning in LLMs. However, these datasets often exhibit domain bias, potentially constraining the generative capabilities of the models. In an effort to mitigate these limitations, we propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning. This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models to yield a diverse and controllable dataset with varied image content. This not only provides greater flexibility compared to existing methodologies but also significantly enhances several model capabilities. Our research includes comprehensive experiments conducted on various datasets using the open-source LLAVA model as a testbed for our proposed pipeline. Our results underscore marked enhancements across more than ten commonly assessed capabilities.
Search
Fix author
Co-authors
- Ling Chen 1
- Baohua Dong 1
- Bin Fu 1
- Zhu Hangcheng 1
- Shuaihan Huang 1
- show all...