Neelesh Kumar Shukla

2026

Annotating Indian Regional Biases using Large Language Models: Evaluation and Analysis
Debasmita Panda | Akash Anil | Neelesh Kumar Shukla
Proceedings of the 15th Joint Conference on Lexical and Computational Semantics (*SEM 2026)

Social biases based on regional identity (or regional bias) are often observed in Indian contexts on major online social networks and require critical attention. However, due to large linguistic and cultural diversity, high annotation costs, and inherent human biases, very little annotated data exists on regional biases in the Indian context. Recently, Large Language Models (LLMs) have garnered attention for the automatic annotation of text. However, such annotation efforts are largely limited to English texts, and LLMs often perform poorly when applied to low-resource languages. Therefore, this paper focuses on understanding the capabilities and challenges of popular open-source LLMs in annotating Indian regional biases. We utilize the recently proposed IndRegBias dataset, which consists of Indian regionally biased social media comments in both English and code-mixed formats. First, we assess the annotation capabilities of LLMs in a zero-shot setting and critically analyze their performance across different writing styles, including code-mixing, transliteration, and English. We find that the majority of LLMs exhibit low agreement with human annotations (measured using Cohen’s kappa). Consequently, we extend our study by fine-tuning the models using 50% of the data and evaluating them on the remaining 50%. We observe a significant improvement in annotation agreement (kappa) for all the LLMs. To further assess the capabilities of the fine-tuned models, we evaluate them on 500 newly collected social media comments discussing regional issues in India. The results show that most fine-tuned LLMs outperform their zero-shot counterparts when annotating these new comments.

2025

pdf bib abs

KULFi Framework: Knowledge Utilization for Optimizing Large Language Models for Financial Causal Reasoning
Neelesh Kumar Shukla | Sandeep Singh | Prabhat Kumar Prabhakar | Sakthivel Thangaraj | Weiyi Sun | C Prasanna Venkatesan | Viji Krishnamurthy
Proceedings of the Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing (FNP), and the 1st Workshop on Large Language Models for Finance and Legal (LLMFinLegal)

This paper presents our contribution to the Financial Document Causality Detection (FinCausal) task 2025. The FinCausal challenge centers on the extraction of cause-and-effect relationships from financial texts written in both English and Spanish. We introduce KULFi, a novel Knowledge Utilization framework designed to augment the capabilities of Large Language Models (LLMs) by leveraging the expertise of more advanced reasoning models. Through the utilization of Teacher LLMs to generate task-specific instructions, KULFi optimizes the performance of Student LLMs via automated prompt optimization. We evaluate the efficacy of KULFi on the Financial Document Causality Detection Task, where Student LLM achieves a similarity score comparable to human-guided prompt optimization for the same LLM, demonstrating significant improvements in causal reasoning performance. Our results demonstrate that KULFi enables effective knowledge transfer from more robust models to less capable ones, as well as efficient learning from training data, minimizing the need for human input in prompt design and enabling more precise causal analysis in financial contexts. Our system attained SAS and Exact Match scores of 0.92 and 0.35 on the English dataset, and 0.92 and 0.09 on the Spanish dataset, respectively. This framework has far-reaching implications, with potential applications in enhancing decision-making across complex financial environments.

pdf bib abs

GraphRAG Analysis for Financial Narrative Summarization and A Framework for Optimizing Domain Adaptation
Neelesh Kumar Shukla | Prabhat Prabhakar | Sakthivel Thangaraj | Sandeep Singh | Weiyi Sun | C Prasanna Venkatesan | Viji Krishnamurthy
Proceedings of the Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing (FNP), and the 1st Workshop on Large Language Models for Finance and Legal (LLMFinLegal)

Large Language Models (LLMs) have shown promise in summarizing complex documents, but their limitations in handling lengthy documents and capturing global information hinder their performance in tasks like Query-Focused Summarization (QFS). We explore GraphRAG, a retrieval-augmented generation approach that utilizes a globally summarized knowledge graph derived from an LLM. We apply GraphRAG to the Financial Narrative Summarization (FNS) dataset, which consists of lengthy financial reports. Our results show that a naive RAG approach outperforms GraphRAG in terms of comprehensiveness, directness, conciseness and completeness. However, we demonstrate that optimizing entity and relation extraction using an LLM as an optimizer can enhance GraphRAG’s performance. Our study highlights the need for domain-specific optimization to improve GraphRAG’s capabilities for summarization tasks in facts-heavy domains like finance. We propose an optimization framework that extends GraphRAG’s original domain adaptation strategy by incorporating entity and relations optimization, leading to improved performance in capturing relevant entities and relationships. Our findings contribute to the development of more effective summarization models for complex documents in finance and other domains.

Co-authors

Akash Anil 1

Debasmita Panda 1

Prabhat Prabhakar 1

Prabhat Kumar Prabhakar 1

Venues

Fix author