Soumyabrata Pal


2025

pdf bib
From Selection to Generation: A Survey of LLM-based Active Learning
Yu Xia | Subhojyoti Mukherjee | Zhouhang Xie | Junda Wu | Xintong Li | Ryan Aponte | Hanjia Lyu | Joe Barrow | Hongjie Chen | Franck Dernoncourt | Branislav Kveton | Tong Yu | Ruiyi Zhang | Jiuxiang Gu | Nesreen K. Ahmed | Yu Wang | Xiang Chen | Hanieh Deilamsalehy | Sungchul Kim | Zhengmian Hu | Yue Zhao | Nedim Lipka | Seunghyun Yoon | Ting-Hao Kenneth Huang | Zichao Wang | Puneet Mathur | Soumyabrata Pal | Koyel Mukherjee | Zhehao Zhang | Namyong Park | Thien Huu Nguyen | Jiebo Luo | Ryan A. Rossi | Julian McAuley
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Active Learning (AL) has been a powerful paradigm for improving model efficiency and performance by selecting the most informative data points for labeling and training. In recent active learning frameworks, Large Language Models (LLMs) have been employed not only for selection but also for generating entirely new data instances and providing more cost-effective annotations. Motivated by the increasing importance of high-quality data and efficient model training in the era of LLMs, we present a comprehensive survey on LLM-based Active Learning. We introduce an intuitive taxonomy that categorizes these techniques and discuss the transformative roles LLMs can play in the active learning loop. We further examine the impact of AL on LLM learning paradigms and its applications across various domains. Finally, we identify open challenges and propose future research directions. This survey aims to serve as an up-to-date resource for researchers and practitioners seeking to gain an intuitive understanding of LLM-based AL techniques and deploy them to new applications.

pdf bib
PromptRefine: Enhancing Few-Shot Performance on Low-Resource Indic Languages with Example Selection from related Example Banks
Soumya Suvra Ghosal | Soumyabrata Pal | Koyel Mukherjee | Dinesh Manocha
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Large Language Models (LLMs) have recently demonstrated impressive few-shot learning capabilities through in-context learning (ICL). However, ICL performance is highly dependent on the choice of few-shot demonstrations, making the selection of the most optimal examples a persistent research challenge. This issue is further amplified in low-resource Indic languages, where the scarcity of ground-truth data complicates the selection process. In this work, we propose PromptRefine, a novel Alternating Minimization approach for example selection that improves ICL performance on low-resource Indic languages. PromptRefine leverages auxiliary example banks from related high-resource Indic languages and employs multi-task learning techniques to align language-specific retrievers, enabling effective cross-language retrieval. Additionally, we incorporate diversity in the selected examples to enhance generalization and reduce bias. Through comprehensive evaluations on four text generation tasks—Cross-Lingual Question Answering, Multilingual Question Answering, Machine Translation, and Cross-Lingual Summarization using state-of-the-art LLMs such as LLAMA-3.1-8B, LLAMA-2-7B, Qwen-2-7B, and Qwen-2.5-7B, we demonstrate that PromptRefine significantly outperforms existing frameworks for retrieving examples.