Yanghao Li


2025

pdf bib
Improve Vision Language Model Chain-of-thought Reasoning
Ruohong Zhang | Bowen Zhang | Yanghao Li | Haotian Zhang | Zhiqing Sun | Zhe Gan | Yinfei Yang | Ruoming Pang | Yiming Yang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Chain-of-thought (CoT) reasoning in vision language models (VLMs) is crucial for improving interpretability and trustworthiness. However, current training recipes often relying on datasets dominated by short annotations with minimal rationales. In this work, we show that training VLM on short answers leads to poor generalization on reasoning tasks that require more detailed explanations. To address this limitation, we propose a two-stage post-training strategy that extends the usage of short answer data for enhanced CoT reasoning. First, we augment short answers with CoT reasoning generated by GPT-4o, enhancing the VLM’s CoT capabilities through fine-tuning. Second, we leverage short answers as outcome rewards for reinforcement learning. Specifically, short answers are used as correctness indicators to construct positive (correct) and negative (incorrect) pairs from model-generated reasoning chains. These pairs are then used to calibrate the model’s reasoning via Direct Preference Optimization. Our experiments show significant improvements in CoT reasoning on benchmark datasets, along with enhanced generalization to direct answer prediction. This work provides a critical data resource for VLM CoT training and demonstrates the effectiveness of outcome rewards for multimodal models post-training.

pdf bib
Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs
Runchu Tian | Yanghao Li | Yuepeng Fu | Siyang Deng | Qinyu Luo | Cheng Qian | Shuo Wang | Xin Cong | Zhong Zhang | Yesai Wu | Yankai Lin | Huadong Wang | Xiaojiang Liu
Findings of the Association for Computational Linguistics: ACL 2025

Positional bias in large language models hinders their ability to effectively process long inputs. A prominent example is the “lost in the middle” phenomenon, where LLMs struggle to utilize relevant information situated in the middle of the input. While prior research primarily focuses on single pieces of relevant information, real-world applications often involve multiple relevant information pieces. To bridge this gap, we present LongPiBench, a benchmark designed to assess positional bias involving multiple pieces of relevant information. It includes various tasks and input lengths. Thorough experiments are conducted with three commercial and six open-source models. These experiments reveal that while most current models are more robust against the “lost in the middle” issue, there also exist noticeable biases related to the spacing of relevant information pieces. These findings highlight the importance of evaluating and reducing positional biases for long-context LLMs.