Mann Bajpai

2026

Exploring Capability Thresholds in Ultra-Lightweight LLM Judges for Nugget-Based Report Evaluation
Mann Bajpai | Pulkit Chatwal | Priyanshu Deswal | Harish Pratap Singh | Santosh Kumar Mishra
Proceedings of the 1st Workshop on Multilingual Report Generation via Retrieval Augmented Generation (RAG4Reports 2026)

Reliable automatic evaluation of retrieval-grounded long-form reports typically requires human annotation or frontier-scale proprietary LLMs, both of which are expensive in constrained settings. Team rgipt participated in RAG4Reports@ACL 2026 Task 1 with a zero-shot nugget-verification system that runs entirely on a single NVIDIA T4 GPU. We compare three ultra-lightweight decoder-only models: Qwen2-0.5B, Qwen2-1.5B, and Qwen2.5-0.5B, under identical inference conditions to examine how small an LLM judge can be while retaining human-aligned ranking signal. Both Qwen2 models produced negative 𝜏_gap, whereas Qwen2.5-0.5B achieved 𝜏_gap = 0.0772 and Pearson r = 0.2209, ranking 13th of 21 teams. Within this family and evaluation setting, model generation appears to matter more than parameter count, although this finding is based on three configurations on a single task and warrants further validation.

2025

pdf bib

Meta Prompting for Analyst Report Generation: Turning Earnings Calls into Investment Guidance
Pulkit Chatwal | Mann Bajpai | Priyanshu Deswal | Harish Pratap Singh | Santosh Kumar Mishra
Proceedings of The 10th Workshop on Financial Technology and Natural Language Processing

Co-authors

Venues

Fix author