Shishir G Patil
2026
AdvancedIF: Rubric-Based Benchmarking and Reinforcement Learning for Advancing LLM Instruction Following
Yun He | Wenzhe Li | Hejia Zhang | Songlin Li | Karishma Mandyam | Sopan Khosla | Yuanhao Xiong | Nanshu Wang | Xiaoliang Peng | Beibin Li | Shengjie Bi | Shishir G Patil | Qi Qi | Shengyu Feng | Julian Katz-Samuels | Richard Yuanzhe Pang | Sujan Kumar Gonugondla | Hunter Lang | Yue Yu | Yundi Qian | Maryam Fazel-Zarandi | Licheng Yu | Amine Benhalloum | Hany Hassan Awadalla | Manaal Faruqui
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yun He | Wenzhe Li | Hejia Zhang | Songlin Li | Karishma Mandyam | Sopan Khosla | Yuanhao Xiong | Nanshu Wang | Xiaoliang Peng | Beibin Li | Shengjie Bi | Shishir G Patil | Qi Qi | Shengyu Feng | Julian Katz-Samuels | Richard Yuanzhe Pang | Sujan Kumar Gonugondla | Hunter Lang | Yue Yu | Yundi Qian | Maryam Fazel-Zarandi | Licheng Yu | Amine Benhalloum | Hany Hassan Awadalla | Manaal Faruqui
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Recent progress in large language models (LLMs) has led to impressive performance on a range of tasks, yet advanced instruction following (IF)—especially for complex, multi-turn, and system-prompted instructions—remains a significant challenge. Rigorous evaluation and effective training for such capabilities are hindered by the lack of high-quality, human-annotated benchmarks and reliable, interpretable reward signals. In this work, we introduce AdvancedIF, a comprehensive benchmark featuring over 1,600 prompts and expert-curated rubrics that assess LLMs’ ability to follow complex, multi-turn, and system-level instructions. We also open-source the evaluation script of AdvancedIF. We further propose RIFL (Rubric-based Instruction-Following Learning), a novel post-training pipeline that leverages rubric generation, a finetuned rubric verifier, and reward shaping to enable effective reinforcement learning for instruction following. Extensive experiments demonstrate that RIFL substantially improves the instruction-following abilities of LLMs, achieving a 6.7% absolute gain on AdvancedIF and strong results on public benchmarks. Our ablation studies confirm the effectiveness of each component in RIFL. This work establishes rubrics as a powerful tool for both training and evaluating advanced IF in LLMs, paving the way for more capable and reliable AI systems.
2025
Language Models Can Easily Learn to Reason from Demonstrations
Dacheng Li | Shiyi Cao | Tyler Griggs | Shu Liu | Xiangxi Mo | Eric Tang | Sumanth Hegde | Kourosh Hakhamaneshi | Shishir G Patil | Matei Zaharia | Joseph E. Gonzalez | Ion Stoica
Findings of the Association for Computational Linguistics: EMNLP 2025
Dacheng Li | Shiyi Cao | Tyler Griggs | Shu Liu | Xiangxi Mo | Eric Tang | Sumanth Hegde | Kourosh Hakhamaneshi | Shishir G Patil | Matei Zaharia | Joseph E. Gonzalez | Ion Stoica
Findings of the Association for Computational Linguistics: EMNLP 2025
Large reasoning models (LRMs) tackle complex problems by following long chain-of-thoughts (Long CoT) that incorporate reflection, backtracking, and self-validation. However, the training techniques and data requirements to elicit Long CoT remain poorly understood. In this work, we find that language models can effectively learn Long CoT reasoning through data-efficient supervised fine-tuning (SFT) and further parameter-efficient low-rank adaptation (LoRA). Crucially, we find that the structure of Long CoT is critical to the learning process in this data-efficient fine-tuning process. Training on content-incorrect examples, e.g. those lead to incorrect answers or corrupted digits, still leads to significant performance gains. In contrast, training on structurally incorrect examples, e.g., with shuffled or deleted reasoning steps, yield smaller improvements or even degrade performance.
2024
LLoCO: Learning Long Contexts Offline
Sijun Tan | Xiuyu Li | Shishir G Patil | Ziyang Wu | Tianjun Zhang | Kurt Keutzer | Joseph E. Gonzalez | Raluca Ada Popa
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Sijun Tan | Xiuyu Li | Shishir G Patil | Ziyang Wu | Tianjun Zhang | Kurt Keutzer | Joseph E. Gonzalez | Raluca Ada Popa
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Processing long contexts remains a challenge for large language models (LLMs) due to the quadratic computational and memory overhead of the self-attention mechanism and the substantial KV cache sizes during generation. We propose LLoCO, a novel approach to address this problem by learning contexts offline through context compression and in-domain parameter-efficient finetuning with LoRA. Our method enables an LLM to create a concise representation of the original context and efficiently retrieve relevant information to answer questions accurately. Our approach extends the effective context window of a 4k token LLaMA2-7B model to handle up to 128k tokens. We evaluate our approach on several long-context question-answering datasets, demonstrating that LLoCO significantly outperforms in-context learning while using 30 × fewer tokens during inference. LLoCO achieves up to 7.62 × speed-up during inference and 11.52 × higher throughput during finetuning, substantially reduces the cost of long document question answering. This makes it a promising solution for efficient long context processing.
Search
Fix author
Co-authors
- Joseph E. Gonzalez 2
- Amine Benhalloum 1
- Shengjie Bi 1
- Shiyi Cao 1
- Manaal Faruqui 1
- Maryam Fazel-Zarandi 1
- Shengyu Feng 1
- Sujan Kumar Gonugondla 1
- Tyler Griggs 1
- Kourosh Hakhamaneshi 1
- Hany Hassan Awadalla 1
- Yun He 1
- Sumanth Hegde 1
- Julian Katz-Samuels 1
- Kurt Keutzer 1
- Sopan Khosla 1
- Hunter Lang 1
- Beibin Li 1
- Dacheng Li 1
- Songlin Li 1
- Wenzhe Li 1
- Xiuyu Li 1
- Shu Liu 1
- Karishma Mandyam 1
- Xiangxi Mo 1
- Richard Yuanzhe Pang 1
- Xiaoliang Peng 1
- Raluca Ada Popa 1
- Qi Qi 1
- Yundi Qian 1
- Ion Stoica 1
- Sijun Tan 1
- Eric Tang 1
- Nanshu Wang 1
- Ziyang Wu 1
- Yuanhao Xiong 1
- Licheng Yu 1
- Yue Yu 1
- Matei Zaharia 1
- Hejia Zhang 1
- Tianjun Zhang 1