Shafin Rahman

2025

pdf bib abs
Bidirectional Reasoning Supervision for Multilingual Financial Decision Making
Muhammad Rafsan Kabir | Jawad Ibn Ahad | Robin Krambroeckers | Silvia Ahmed | M M Lutfe Elahi | Nabeel Mohammed | Shafin Rahman
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track

Large Language Models have achieved great success in tasks like sentiment analysis, machine translation, and question answering, yet their effectiveness in the multilingual financial domain remains less explored. This study explores the potential of generative LLMs for classifying financial sustainability in four diverse languages: English, Hindi, Bengali, and Telugu, representing low, medium, and high-resource language categories. We propose a novel fine-tuning approach that integrates both positive and negative rationales alongside classification labels. Unlike existing approaches, our method improves classification performance by incorporating structured bidirectional reasoning into financial decision-making. Extensive evaluations demonstrate that the proposed approach consistently outperforms prior methods across all four languages, establishing new benchmark results for multilingual financial NLP. Notably, it also enables smaller models to achieve competitive or even superior performance compared to significantly larger models fine-tuned with conventional methods, demonstrating its suitability for industry applications.

pdf bib abs
CAPSTONE: Composable Attribute‐Prompted Scene Translation for Zero‐Shot Vision–Language Reasoning
Md. Ismail Hossain | Shahriyar Zaman Ridoy | Moshiur Farazi | Nabeel Mohammed | Shafin Rahman
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track

Interpreting visual scenes with high-level reasoning is essential for many real-world applications, such as autonomous systems andcontent moderation, but training and maintaining Vision–Language Models (VLMs) remains resource-intensive and opaque. In this work, we present CAPSTONE, a lightweight, modular framework designed for industrial settings. Instead of relying on multimodal training or fine-tuning large models, CAPSTONE transforms outputs from off-the-shelf vision models into structured text prompts that can be interpreted by a frozen Large Language Model (LLM). This plug-and-play architecture enables reasoning over visual input without access to raw pixels, dramatically reducing computational cost and complexity. On the POPE dataset, our system, using a 7B LLM, outperforms several fully trained VLMs in zero-shot evaluations, while on the VSR benchmark, the 4B model achieves competitive results, together demonstrating strong generalization without retraining. CAPSTONE offers a scalable and interpretable alternative for companies looking to integrate visual reasoning capabilities without the burden of full-scale VLM pipelines.

Large Language Models (LLMs) excel at complexreasoning tasks, yet their performance hinges on the quality of their prompts and pipeline structures. Manual promptdesign, as used in frameworks like DSPy, poses significantlimitations: it is time-intensive, demands substantial expertise,and lacks scalability, restricting the widespread use of LLMsacross diverse applications. To overcome these challenges, weintroduce AutoDSPy, the first framework to fully automateDSPy pipeline construction using reinforcement learning (RL).AutoDSPy leverages an RL-tuned policy network to dynamicallyselect optimal reasoning modules—such as Chain-of-Thought forlogical tasks or ReAct for tool integration—along with inputoutput signatures and execution strategies, entirely eliminatingthe need for manual configuration. Experimental results on theGSM8K and HotPotQA benchmarks demonstrate that AutoDSPyoutperforms traditional DSPy baselines, achieving accuracy gainsof up to 4.3% while reducing inference time, even with smallermodels like GPT-2 (127M). By integrating RL-based automation,AutoDSPy enhances both efficiency and accessibility, simplifyingthe development of structured, high-performing LLM solutionsand enabling scalability across a wide range of tasks

2024

pdf bib abs
Thesis Proposal: Detecting Empathy Using Multimodal Language Model
Md Rakibul Hasan | Md Zakir Hossain | Aneesh Krishna | Shafin Rahman | Tom Gedeon
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop

Empathy is crucial in numerous social interactions, including human-robot, patient-doctor, teacher-student, and customer-call centre conversations. Despite its importance, empathy detection in videos continues to be a challenging task because of the subjective nature of empathy and often remains under-explored. Existing studies have relied on scripted or semi-scripted interactions in text-, audio-, or video-only settings that fail to capture the complexities and nuances of real-life interactions. This PhD research aims to fill these gaps by developing a multimodal language model (MMLM) that detects empathy in audiovisual data. In addition to leveraging existing datasets, the proposed study involves collecting real-life interaction video and audio. This study will leverage optimisation techniques like neural architecture search to deliver an optimised small-scale MMLM. Successful implementation of this project has significant implications in enhancing the quality of social interactions as it enables real-time measurement of empathy and thus provides potential avenues for training for better empathy in interactions.

pdf bib abs
LLM-GEm: Large Language Model-Guided Prediction of People’s Empathy Levels towards Newspaper Article
Md Rakibul Hasan | Md Zakir Hossain | Tom Gedeon | Shafin Rahman
Findings of the Association for Computational Linguistics: EACL 2024

Empathy – encompassing the understanding and supporting others’ emotions and perspectives – strengthens various social interactions, including written communication in healthcare, education and journalism. Detecting empathy using AI models by relying on self-assessed ground truth through crowdsourcing is challenging due to the inherent noise in such annotations. To this end, we propose a novel system, named Large Language Model-Guided Empathy _(LLM-GEm)_ prediction system. It rectifies annotation errors based on our defined annotation selection threshold and makes the annotations reliable for conventional empathy prediction models, e.g., BERT-based pre-trained language models (PLMs). Previously, demographic information was often integrated numerically into empathy detection models. In contrast, our _LLM-GEm_ leverages GPT-3.5 LLM to convert numerical data into semantically meaningful textual sequences, enabling seamless integration into PLMs. We experiment with three _NewsEmpathy_ datasets involving people’s empathy levels towards newspaper articles and achieve state-of-the-art test performance using a RoBERTa-based PLM. Code and evaluations are publicly available at [https://github.com/hasan-rakibul/LLM-GEm](https://github.com/hasan-rakibul/LLM-GEm).

2023

pdf bib abs
Curtin OCAI at WASSA 2023 Empathy, Emotion and Personality Shared Task: Demographic-Aware Prediction Using Multiple Transformers
Md Rakibul Hasan | Md Zakir Hossain | Tom Gedeon | Susannah Soon | Shafin Rahman
Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis

The WASSA 2023 shared task on predicting empathy, emotion and other personality traits consists of essays, conversations and articles in textual form and participants’ demographic information in numerical form. To address the tasks, our contributions include (1) converting numerical information into meaningful text information using appropriate templates, (2) summarising lengthy articles, and (3) augmenting training data by paraphrasing. To achieve these contributions, we leveraged two separate T5-based pre-trained transformers. We then fine-tuned pre-trained BERT, DistilBERT and ALBERT for predicting empathy and personality traits. We used the Optuna hyperparameter optimisation framework to fine-tune learning rates, batch sizes and weight initialisation. Our proposed system achieved its highest performance – a Pearson correlation coefficient of 0.750 – on the onversation-level empathy prediction task1 . The system implementation is publicly available at https: //github.com/hasan-rakibul/WASSA23-empathy-emotion.

Co-authors

Venues

Fix data