2025
pdf
bib
abs
LLMForum-RAG: A Multilingual, Multi-domain Framework for Factual Reasoning via Weighted Retrieval and LLM Collaboration
Soham Chaudhuri
|
Dipanjan Saha
|
Dipankar Das
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
LLMs have emerged as a transformative technology, enabling a wide range of tasks such as text generation, summarization, question answering, and more. The use of RAG with LLM is on the rise to provide deeper knowledge bases of various domains. In the present study, we propose a RAG framework that employs weighted Rocchio mechanism for retrieval and LLM collaborative forum with supervision for generation. Our framework is evaluated in two downstream tasks: a biomedical question answering (BioASQ-QA) and a multilingual claim verification (e.g. in English, Hindi, and Bengali) to showcase its adaptability across various domains and languages. The proposed retriever is capable to achieve substantial improvement over BM25 of +8% (BioASQ-QA), +15% (English), +5% (Hindi), and +20% (Bengali) for Recall@5. In veracity classification, our framework achieves an average answer correctness of 0.78 on BioASQ-QA while achieving F1-score of 0.59, 0.56, and 0.41 for English, Hindi and Bengali languages, respectively. These results demonstrate the effectiveness and robustness of our framework for retrieval and generation in multilingual and multi-domain settings.
pdf
bib
abs
IWSLT 2025 Indic Track System Description Paper: Speech-to-Text Translation from Low-Resource Indian Languages (Bengali and Tamil) to English
Sayan Das
|
Soham Chaudhuri
|
Dipanjan Saha
|
Dipankar Das
|
Sivaji Bandyopadhyay
Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025)
Multi-language Speech-to-Text Translation (ST) plays a crucial role in breaking linguistic barriers, particularly in multilingual regions like India. This paper focuses on building a robust ST system for low resource Indian languages, with a special emphasis on Bengali and Tamil. These languages represent the Indo-Aryan and Dravidian families, respectively. The dataset used in this work comprises spoken content from TED Talks and conferences, paired with transcriptions in English and their translations in Bengali and Tamil. Our work specifically addresses the translation of Bengali and Tamil speech to English text, a critical area given the scarcity of annotated speech data. To enhance translation quality and model robustness, we leverage cross-lingual resources and word level translation strategies. The ultimate goal is to develop an end-to-end ST model capable of real-world deployment for under represented languages.
pdf
bib
abs
JUNLP@LT-EDI-2025: Efficient Low-Rank Adaptation of Whisper for Inclusive Tamil Speech Recognition Targeting Vulnerable Populations
Priyobroto Acharya
|
Soham Chaudhuri
|
Sayan Das
|
Dipanjan Saha
|
Dipankar Das
Proceedings of the 5th Conference on Language, Data and Knowledge: Fifth Workshop on Language Technology for Equality, Diversity, Inclusion
Speech recognition has received extensive research attention in recent years. It becomes much more challenging when the speaker’s age, gender and other factors introduce variations in the speech. In this work, we propose a fine-tuned automatic speech recognition model derived from OpenAI’s whisperlarge-v2. Though we experimented with both Whisper-large and Wav2vec2-XLSR-large, the reduced WER of whisper-large proved to be a superior model. We secured 4th rank in the LT-EDI-2025 shared task. Our implementation details and code are available at our GitHub repository1.
pdf
bib
abs
SpeechEE@XLLM25: End-to-End Structured Event Extraction from Speech
Soham Chaudhuri
|
Diganta Biswas
|
Dipanjan Saha
|
Dipankar Das
|
Sivaji Bandyopadhyay
Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025)
Event extraction from text is a complex taskthat involves the identification of event triggersand their supporting arguments. Whenapplied to speech, this task becomes evenmore challenging due to the continuous natureof audio signals and the need for robustAutomatic Speech Recognition (ASR). Thispaper proposes an approach that integratesASR with event extraction by utilizing theWhisper model for speech recognition and aText2Event2 Transformer for extracting eventsfrom English audio samples. The Whispermodel is used to generate transcripts from audio,which are then fed into the Text2Event2Transformer to identify event triggers and theirarguments. This approach combines two difficulttasks into one, streamlining the processof extracting structured event information directlyfrom audio. Our approach leverages arobust ASR system (Whisper) followed by aparameter-efficient transformer (Text2Event2fine-tuned via LoRA) to extract structuredevents from raw speech. Unlike prior worktrained on gold textual input, our pipeline istrained end-to-end on noisy ASR outputs. Despitesignificant resource constraints and datanoise, our system ranked first in the ACL 2025XLLM Shared Task II.