Shishir G Patil
2025
Language Models Can Easily Learn to Reason from Demonstrations
Dacheng Li
|
Shiyi Cao
|
Tyler Griggs
|
Shu Liu
|
Xiangxi Mo
|
Eric Tang
|
Sumanth Hegde
|
Kourosh Hakhamaneshi
|
Shishir G Patil
|
Matei Zaharia
|
Joseph E. Gonzalez
|
Ion Stoica
Findings of the Association for Computational Linguistics: EMNLP 2025
Large reasoning models (LRMs) tackle complex problems by following long chain-of-thoughts (Long CoT) that incorporate reflection, backtracking, and self-validation. However, the training techniques and data requirements to elicit Long CoT remain poorly understood. In this work, we find that language models can effectively learn Long CoT reasoning through data-efficient supervised fine-tuning (SFT) and further parameter-efficient low-rank adaptation (LoRA). Crucially, we find that the structure of Long CoT is critical to the learning process in this data-efficient fine-tuning process. Training on content-incorrect examples, e.g. those lead to incorrect answers or corrupted digits, still leads to significant performance gains. In contrast, training on structurally incorrect examples, e.g., with shuffled or deleted reasoning steps, yield smaller improvements or even degrade performance.
2024
LLoCO: Learning Long Contexts Offline
Sijun Tan
|
Xiuyu Li
|
Shishir G Patil
|
Ziyang Wu
|
Tianjun Zhang
|
Kurt Keutzer
|
Joseph E. Gonzalez
|
Raluca Ada Popa
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Processing long contexts remains a challenge for large language models (LLMs) due to the quadratic computational and memory overhead of the self-attention mechanism and the substantial KV cache sizes during generation. We propose LLoCO, a novel approach to address this problem by learning contexts offline through context compression and in-domain parameter-efficient finetuning with LoRA. Our method enables an LLM to create a concise representation of the original context and efficiently retrieve relevant information to answer questions accurately. Our approach extends the effective context window of a 4k token LLaMA2-7B model to handle up to 128k tokens. We evaluate our approach on several long-context question-answering datasets, demonstrating that LLoCO significantly outperforms in-context learning while using 30 × fewer tokens during inference. LLoCO achieves up to 7.62 × speed-up during inference and 11.52 × higher throughput during finetuning, substantially reduces the cost of long document question answering. This makes it a promising solution for efficient long context processing.
Search
Fix author
Co-authors
- Joseph E. Gonzalez 2
- Shiyi Cao 1
- Tyler Griggs 1
- Kourosh Hakhamaneshi 1
- Sumanth Hegde 1
- show all...