Sumanth Hegde


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2025

pdf bib
Language Models Can Easily Learn to Reason from Demonstrations
Dacheng Li | Shiyi Cao | Tyler Griggs | Shu Liu | Xiangxi Mo | Eric Tang | Sumanth Hegde | Kourosh Hakhamaneshi | Shishir G Patil | Matei Zaharia | Joseph E. Gonzalez | Ion Stoica
Findings of the Association for Computational Linguistics: EMNLP 2025

Large reasoning models (LRMs) tackle complex problems by following long chain-of-thoughts (Long CoT) that incorporate reflection, backtracking, and self-validation. However, the training techniques and data requirements to elicit Long CoT remain poorly understood. In this work, we find that language models can effectively learn Long CoT reasoning through data-efficient supervised fine-tuning (SFT) and further parameter-efficient low-rank adaptation (LoRA). Crucially, we find that the structure of Long CoT is critical to the learning process in this data-efficient fine-tuning process. Training on content-incorrect examples, e.g. those lead to incorrect answers or corrupted digits, still leads to significant performance gains. In contrast, training on structurally incorrect examples, e.g., with shuffled or deleted reasoning steps, yield smaller improvements or even degrade performance.