Diya Saha


2025

pdf bib
Benchmarking Bangla Causality: A Dataset of Implicit and Explicit Causal Sentences and Cause-Effect Relations
Diya Saha | Sudeshna Jana | Manjira Sinha | Tirthankar Dasgupta
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

Causal reasoning is central to language understanding, yet remains under-resourced in Bangla. In this paper, we introduce the first large-scale dataset for causal inference in Bangla, consisting of over 11663 sentences annotated for causal sentence types (explicit, implicit, non-causal) and token-level spans for causes, effects, and connectives. The dataset captures both simple and complex causal structures across diverse domains such as news, education, and health. We further benchmark a suite of state-of-the-art instruction-tuned large language models, including LLaMA 3.3 70B, Gemma 2 9B, Qwen 32B, and DeepSeek, under zero-shot and three-shot prompting conditions. Our analysis reveals that while LLMs demonstrate moderate success in explicit causality detection, their performance drops significantly on implicit and span-level extraction tasks. This work establishes a foundational resource for Bangla causal understanding and highlights key challenges in adapting multilingual LLMs for structured reasoning in low-resource languages.

2024

pdf bib
EnClaim: A Style Augmented Transformer Architecture for Environmental Claim Detection
Diya Saha | Manjira Sinha | Tirthankar Dasgupta
Proceedings of the 1st Workshop on Natural Language Processing Meets Climate Change (ClimateNLP 2024)

Across countries, a noteworthy paradigm shift towards a more sustainable and environmentally responsible economy is underway. However, this positive transition is accompanied by an upsurge in greenwashing, where companies make exaggerated claims about their environmental commitments. To address this challenge and protect consumers, initiatives have emerged to substantiate green claims. With the proliferation of environmental and scientific assertions, a critical need arises for automated methods to detect and validate these claims at scale. In this paper, we introduce EnClaim, a transformer network architecture augmented with stylistic features for automatically detecting claims from open web documents or social media posts. The proposed model considers various linguistic stylistic features in conjunction with language models to predict whether a given statement constitutes a claim. We have rigorously evaluated the model using multiple open datasets. Our initial findings indicate that incorporating stylistic vectors alongside the BERT-based language model enhances the overall effectiveness of environmental claim detection.