Dipanjan Saha


2025

pdf bib
IWSLT 2025 Indic Track System Description Paper: Speech-to-Text Translation from Low-Resource Indian Languages (Bengali and Tamil) to English
Sayan Das | Soham Chaudhuri | Dipanjan Saha | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025)

Multi-language Speech-to-Text Translation (ST) plays a crucial role in breaking linguistic barriers, particularly in multilingual regions like India. This paper focuses on building a robust ST system for low resource Indian languages, with a special emphasis on Bengali and Tamil. These languages represent the Indo-Aryan and Dravidian families, respectively. The dataset used in this work comprises spoken content from TED Talks and conferences, paired with transcriptions in English and their translations in Bengali and Tamil. Our work specifically addresses the translation of Bengali and Tamil speech to English text, a critical area given the scarcity of annotated speech data. To enhance translation quality and model robustness, we leverage cross-lingual resources and word level translation strategies. The ultimate goal is to develop an end-to-end ST model capable of real-world deployment for under represented languages.

pdf bib
JUNLP_Sarika at SemEval-2025 Task 11: Bridging Contextual Gaps in Text-Based Emotion Detection using Transformer Models
Sarika Khatun | Dipanjan Saha | Dipankar Das
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

Because language is subjective, it can be difficult to infer human emotions from textual data. This work investigates the categorization of emotions using BERT, classifying five emotions—angry, fearful, joyful, sad, and surprised—by utilizing its contextual embeddings. Preprocessing techniques like tokenization and stop-word removal are used on the dataset, which comes from social media and personal tales. With a weighted F1-score of 0.75, our model was trained using a multi-label classification strategy. BERT has the lowest F1-score when it comes to rage, but it does well when it comes to identifying fear and surprise. The findings demonstrate the difficulties presented by unbalanced datasets while also highlighting the promise of transformer-based models for text-based emotion identification. Future research will use data augmentation methods, domain-adapted BERT models, and other methods to improve classification performance.

pdf bib
SpeechEE@XLLM25: End-to-End Structured Event Extraction from Speech
Soham Chaudhuri | Diganta Biswas | Dipanjan Saha | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 1st Joint Workshop on Large Language Models and Structure Modeling (XLLM 2025)

Event extraction from text is a complex taskthat involves the identification of event triggersand their supporting arguments. Whenapplied to speech, this task becomes evenmore challenging due to the continuous natureof audio signals and the need for robustAutomatic Speech Recognition (ASR). Thispaper proposes an approach that integratesASR with event extraction by utilizing theWhisper model for speech recognition and aText2Event2 Transformer for extracting eventsfrom English audio samples. The Whispermodel is used to generate transcripts from audio,which are then fed into the Text2Event2Transformer to identify event triggers and theirarguments. This approach combines two difficulttasks into one, streamlining the processof extracting structured event information directlyfrom audio. Our approach leverages arobust ASR system (Whisper) followed by aparameter-efficient transformer (Text2Event2fine-tuned via LoRA) to extract structuredevents from raw speech. Unlike prior worktrained on gold textual input, our pipeline istrained end-to-end on noisy ASR outputs. Despitesignificant resource constraints and datanoise, our system ranked first in the ACL 2025XLLM Shared Task II.

2023

pdf bib
Transfer learning in low-resourced MT: An empirical study
Sainik Kumar Mahata | Dipanjan Saha | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 20th International Conference on Natural Language Processing (ICON)

Translation systems rely on a large and goodquality parallel corpus for producing reliable translations. However, obtaining such a corpus for low-resourced languages is a challenge. New research has shown that transfer learning can mitigate this issue by augmenting lowresourced MT systems with high-resourced ones. In this work, we explore two types of transfer learning techniques, namely, crosslingual transfer learning and multilingual training, both with information augmentation, to examine the degree of performance improvement following the augmentation. Furthermore, we use languages of the same family (Romanic, in our case), to investigate the role of the shared linguistic property, in producing dependable translations.