Swayam Chatterjee
2026
JU-NLP-PG at RAG4Reports 2026: Memory-Efficient Multilingual Report Generation with 4-bit Quantized LLMs
Swayam Chatterjee | Dipankar Das
Proceedings of the 1st Workshop on Multilingual Report Generation via Retrieval Augmented Generation (RAG4Reports 2026)
Swayam Chatterjee | Dipankar Das
Proceedings of the 1st Workshop on Multilingual Report Generation via Retrieval Augmented Generation (RAG4Reports 2026)
In the present article, we have described our system developed for participating in Task B on Multilingual Report Generation under RAG4Reports 2026 at ACL 2026 with submitted run ID ju_nlp_pg. The problem statement is given a report request in English, the system retrieves relevant passages from a four million multilingual document corpus (English, Chinese, Russian, Arabic) and generates a grounded, citation-bearing report. Our core challenge was how to fit a large retrieval corpus along with a capable generative model on a two-GPU node with ≈29 GB RAM. We addressed the challenge employing three different techniques: (1) 4-bit NF4 quantization, shrinking the LLM from ≈14 GB to ≈4 GB; (2) memory-mapped, chunked FAISS index construction over pre-computed multilingual-e5-large embeddings; and (3) strict model-loading order to prevent heap fragmentation. On the other hand, the reports are structured around topic nuggets to directly target the Auto-ARGUE evaluation signal.