Joydeb Mondal

2025

pdf bib abs
ExpertNeurons at SciVQA-2025: Retrieval Augmented VQA with Vision Language Model (RAVQA-VLM)
Nagaraj N Bhat | Joydeb Mondal | Srijon Sarkar
Proceedings of the Fifth Workshop on Scholarly Document Processing (SDP 2025)

We introduce RAVQA-VLM, a novel Retrieval-Augmented Generation (RAG) architecture with Vision Language Model for the SciVQA challenge, which targets closed-ended visual and nonvisual questions over scientific figures drawn from ACL Anthology and arXiv papers (Borisova and Rehm, 2025). Our system first encodes each input figure and its accompanying metadata (caption, figure ID, type) into dense embed- dings, then retrieves context passages from the full PDF of the source paper via a Dense Passage Retriever (Karpukhin et al., 2020). The extracted contexts are concatenated with the question and passed to a vision-capable generative backbone (e.g., Phi-3.5, Pixtral-12B, Mixtral-24B-small, InterVL-3-14B) fine-tuned on the 15.1K SciVQA training examples (Yang et al., 2023; Pramanick et al., 2024). We jointly optimize retrieval and generation end-to-end to minimize answer loss and mitigate hallucinations (Lewis et al., 2020; Rujun Han and Castelli, 2024). On the SciVQA test set, RAVQA-VLM achieves significant improvements over parametric only baselines, with relative gains of +5% ROUGE1 and +5% ROUGE-L, demonstrating the efficacy of RAG for multimodal scientific QA.

2022

pdf bib abs
ExpertNeurons at FinCausal 2022 Task 2: Causality Extraction for Financial Documents
Joydeb Mondal | Nagaraj Bhat | Pramir Sarkar | Shahid Reza
Proceedings of the 4th Financial Narrative Processing Workshop @LREC2022

In this paper describes the approach which we have built for causality extraction from the financial documents that we have submitted for FinCausal 2022 task 2. We proving a solution with intelligent pre-processing and post-processing to detect the number of cause and effect in a financial document and extract them. Our given approach achieved 90% as F1 score(weighted-average) for the official blind evaluation dataset.

Co-authors

Venues

Fix author