Swayam Chatterjee


2026

In the present article, we have described our system developed for participating in Task B on Multilingual Report Generation under RAG4Reports 2026 at ACL 2026 with submitted run ID ju_nlp_pg. The problem statement is given a report request in English, the system retrieves relevant passages from a four million multilingual document corpus (English, Chinese, Russian, Arabic) and generates a grounded, citation-bearing report. Our core challenge was how to fit a large retrieval corpus along with a capable generative model on a two-GPU node with ≈29 GB RAM. We addressed the challenge employing three different techniques: (1) 4-bit NF4 quantization, shrinking the LLM from ≈14 GB to ≈4 GB; (2) memory-mapped, chunked FAISS index construction over pre-computed multilingual-e5-large embeddings; and (3) strict model-loading order to prevent heap fragmentation. On the other hand, the reports are structured around topic nuggets to directly target the Auto-ARGUE evaluation signal.