Pulkit Chatwal

2026

Exploring Capability Thresholds in Ultra-Lightweight LLM Judges for Nugget-Based Report Evaluation
Mann Bajpai | Pulkit Chatwal | Priyanshu Deswal | Harish Pratap Singh | Santosh Kumar Mishra
Proceedings of the 1st Workshop on Multilingual Report Generation via Retrieval Augmented Generation (RAG4Reports 2026)

Reliable automatic evaluation of retrieval-grounded long-form reports typically requires human annotation or frontier-scale proprietary LLMs, both of which are expensive in constrained settings. Team rgipt participated in RAG4Reports@ACL 2026 Task 1 with a zero-shot nugget-verification system that runs entirely on a single NVIDIA T4 GPU. We compare three ultra-lightweight decoder-only models: Qwen2-0.5B, Qwen2-1.5B, and Qwen2.5-0.5B, under identical inference conditions to examine how small an LLM judge can be while retaining human-aligned ranking signal. Both Qwen2 models produced negative 𝜏_gap, whereas Qwen2.5-0.5B achieved 𝜏_gap = 0.0772 and Pearson r = 0.2209, ranking 13th of 21 teams. Within this family and evaluation setting, model generation appears to matter more than parameter count, although this finding is based on three configurations on a single task and warrants further validation.

2025

pdf bib

Meta Prompting for Analyst Report Generation: Turning Earnings Calls into Investment Guidance
Pulkit Chatwal | Mann Bajpai | Priyanshu Deswal | Harish Pratap Singh | Santosh Kumar Mishra
Proceedings of The 10th Workshop on Financial Technology and Natural Language Processing

pdf bib abs

Enhancing Causal Relationship Detection Using Prompt Engineering and Large Language Models
Pulkit Chatwal | Amit Agarwal | Ankush Mittal
Proceedings of the Joint Workshop of the 9th Financial Technology and Natural Language Processing (FinNLP), the 6th Financial Narrative Processing (FNP), and the 1st Workshop on Large Language Models for Finance and Legal (LLMFinLegal)

This paper explores the use of large language models (LLMs) and prompt engineering to detect causal relationships in financial disclosures. The task was part of the FinCausal 2025 shared competition, which focuses on identifying cause-and-effect relationships in financial texts across languages. The study demonstrates the effectiveness of LLMs, specifically LLaMA 3.2, in tackling causality detection in English and Spanish financial reports. The paper introduces various prompt engineering techniques, including zero-shot, few-shot, and chain-of-thought (CoT) prompting, to improve performance. For English, the best results were achieved using the Few-Shot + CoT approach, while for Spanish, the Few-Shot method provided strong semantic alignment despite lower exact match accuracy. The evaluation used two metrics: Exact Match (EM) and Semantic Alignment Score (SAS). The results showed high SAS scores for both languages, indicating good semantic understanding, with English performing particularly well. The study emphasizes the importance of tailored prompt engineering techniques to handle language-specific nuances in financial contexts and suggests future research directions, including fine-tuning LLaMA 3.2 and testing additional LLM architectures to enhance multilingual causality detection in financial texts.

pdf bib

Cultura-Arabica: Probing and Enhancing Arabic Cultural Awareness in Large Language Models via LoRA
Pulkit Chatwal | Santosh Kumar Mishra
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks

Co-authors

Ankush Mittal 1

Venues

Fix author