Sujan Kumar Gonugondla
2026
AdvancedIF: Rubric-Based Benchmarking and Reinforcement Learning for Advancing LLM Instruction Following
Yun He | Wenzhe Li | Hejia Zhang | Songlin Li | Karishma Mandyam | Sopan Khosla | Yuanhao Xiong | Nanshu Wang | Xiaoliang Peng | Beibin Li | Shengjie Bi | Shishir G Patil | Qi Qi | Shengyu Feng | Julian Katz-Samuels | Richard Yuanzhe Pang | Sujan Kumar Gonugondla | Hunter Lang | Yue Yu | Yundi Qian | Maryam Fazel-Zarandi | Licheng Yu | Amine Benhalloum | Hany Hassan Awadalla | Manaal Faruqui
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yun He | Wenzhe Li | Hejia Zhang | Songlin Li | Karishma Mandyam | Sopan Khosla | Yuanhao Xiong | Nanshu Wang | Xiaoliang Peng | Beibin Li | Shengjie Bi | Shishir G Patil | Qi Qi | Shengyu Feng | Julian Katz-Samuels | Richard Yuanzhe Pang | Sujan Kumar Gonugondla | Hunter Lang | Yue Yu | Yundi Qian | Maryam Fazel-Zarandi | Licheng Yu | Amine Benhalloum | Hany Hassan Awadalla | Manaal Faruqui
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Recent progress in large language models (LLMs) has led to impressive performance on a range of tasks, yet advanced instruction following (IF)—especially for complex, multi-turn, and system-prompted instructions—remains a significant challenge. Rigorous evaluation and effective training for such capabilities are hindered by the lack of high-quality, human-annotated benchmarks and reliable, interpretable reward signals. In this work, we introduce AdvancedIF, a comprehensive benchmark featuring over 1,600 prompts and expert-curated rubrics that assess LLMs’ ability to follow complex, multi-turn, and system-level instructions. We also open-source the evaluation script of AdvancedIF. We further propose RIFL (Rubric-based Instruction-Following Learning), a novel post-training pipeline that leverages rubric generation, a finetuned rubric verifier, and reward shaping to enable effective reinforcement learning for instruction following. Extensive experiments demonstrate that RIFL substantially improves the instruction-following abilities of LLMs, achieving a 6.7% absolute gain on AdvancedIF and strong results on public benchmarks. Our ablation studies confirm the effectiveness of each component in RIFL. This work establishes rubrics as a powerful tool for both training and evaluating advanced IF in LLMs, paving the way for more capable and reliable AI systems.
2024
BASS: Batched Attention-optimized Speculative Sampling
Haifeng Qian | Sujan Kumar Gonugondla | Sungsoo Ha | Mingyue Shang | Sanjay Krishna Gouda | Ramesh Nallapati | Sudipta Sengupta | Xiaofei Ma | Anoop Deoras
Findings of the Association for Computational Linguistics: ACL 2024
Haifeng Qian | Sujan Kumar Gonugondla | Sungsoo Ha | Mingyue Shang | Sanjay Krishna Gouda | Ramesh Nallapati | Sudipta Sengupta | Xiaofei Ma | Anoop Deoras
Findings of the Association for Computational Linguistics: ACL 2024
Speculative decoding has emerged as a powerful method to improve latency and throughput in hosting large language models. However, most existing implementations focus on generating a single sequence. Real-world generative AI applications often require multiple responses and how to perform speculative decoding in a batched setting while preserving its latency benefits poses non-trivial challenges. This paper describes a system of batched speculative decoding that sets a new state of the art in multi-sequence generation latency and that demonstrates superior GPU utilization as well as quality of generations within a time budget. For example, for a 7.8B-size model on a single A100 GPU and with a batch size of 8, each sequence is generated at an average speed of 5.8ms per token, the overall throughput being 1.1K tokens per second. These results represent state-of-the-art latency and a 2.15× speed-up over optimized regular decoding. Within a time budget that regular decoding does not finish, our system is able to generate sequences with HumanEval Pass@First of 43% and Pass@All of 61%, far exceeding what’s feasible with single-sequence speculative decoding. Our peak GPU utilization during decoding reaches as high as 15.8%, more than 3× the highest of that of regular decoding and around 10× of single-sequence speculative decoding.
Token Alignment via Character Matching for Subword Completion
Ben Athiwaratkun | Shiqi Wang | Mingyue Shang | Yuchen Tian | Zijian Wang | Sujan Kumar Gonugondla | Sanjay Krishna Gouda | Robert Kwiatkowski | Ramesh Nallapati | Parminder Bhatia | Bing Xiang
Findings of the Association for Computational Linguistics: ACL 2024
Ben Athiwaratkun | Shiqi Wang | Mingyue Shang | Yuchen Tian | Zijian Wang | Sujan Kumar Gonugondla | Sanjay Krishna Gouda | Robert Kwiatkowski | Ramesh Nallapati | Parminder Bhatia | Bing Xiang
Findings of the Association for Computational Linguistics: ACL 2024
Generative models, widely utilized in various applications, can often struggle with prompts corresponding to partial tokens. This struggle stems from tokenization, where partial tokens fall out of distribution during inference, leading to incorrect or nonsensical outputs. This paper examines a technique to alleviate the tokenization artifact on text completion in generative models, maintaining performance even in regular non-subword cases. The method, termed token alignment, involves backtracking to the last complete tokens and ensuring the model’s generation aligns with the prompt. This approach showcases marked improvement across many partial token scenarios, including nuanced cases like space-prefix and partial indentation, with only a minor time increase. The technique and analysis detailed in this paper contribute to the continuous advancement of generative models in handling partial inputs, bearing relevance for applications like code completion and text.
Search
Fix author
Co-authors
- Sanjay Krishna Gouda 2
- Ramesh Nallapati 2
- Mingyue Shang 2
- Ben Athiwaratkun 1
- Amine Benhalloum 1
- Parminder Bhatia 1
- Shengjie Bi 1
- Anoop Deoras 1
- Manaal Faruqui 1
- Maryam Fazel-Zarandi 1
- Shengyu Feng 1
- Sungsoo Ha 1
- Hany Hassan Awadalla 1
- Yun He 1
- Julian Katz-Samuels 1
- Sopan Khosla 1
- Robert Kwiatkowski 1
- Hunter Lang 1
- Wenzhe Li 1
- Songlin Li 1
- Beibin Li 1
- Xiaofei Ma 1
- Karishma Mandyam 1
- Richard Yuanzhe Pang 1
- Shishir G Patil 1
- Xiaoliang Peng 1
- Qi Qi 1
- Yundi Qian 1
- Haifeng Qian 1
- Sudipta Sengupta 1
- Yuchen Tian 1
- Nanshu Wang 1
- Shiqi Wang 1
- Zijian Wang 1
- Bing Xiang 1
- Yuanhao Xiong 1
- Yue Yu 1
- Licheng Yu 1
- Hejia Zhang 1