Manjira Sinha

2024

pdf abs
EnClaim: A Style Augmented Transformer Architecture for Environmental Claim Detection
Diya Saha | Manjira Sinha | Tirthankar Dasgupta
Proceedings of the 1st Workshop on Natural Language Processing Meets Climate Change (ClimateNLP 2024)

Across countries, a noteworthy paradigm shift towards a more sustainable and environmentally responsible economy is underway. However, this positive transition is accompanied by an upsurge in greenwashing, where companies make exaggerated claims about their environmental commitments. To address this challenge and protect consumers, initiatives have emerged to substantiate green claims. With the proliferation of environmental and scientific assertions, a critical need arises for automated methods to detect and validate these claims at scale. In this paper, we introduce EnClaim, a transformer network architecture augmented with stylistic features for automatically detecting claims from open web documents or social media posts. The proposed model considers various linguistic stylistic features in conjunction with language models to predict whether a given statement constitutes a claim. We have rigorously evaluated the model using multiple open datasets. Our initial findings indicate that incorporating stylistic vectors alongside the BERT-based language model enhances the overall effectiveness of environmental claim detection.

pdf abs
FORCE: A Benchmark Dataset for Foodborne Disease Outbreak and Recall Event Extraction from News
Sudeshna Jana | Manjira Sinha | Tirthankar Dasgupta
Proceedings of The 9th Social Media Mining for Health Research and Applications (SMM4H 2024) Workshop and Shared Tasks

The escalating prevalence of food safety incidents within the food supply chain necessitates immediate action to protect consumers. These incidents encompass a spectrum of issues, including food product contamination and deliberate food and feed adulteration for economic gain leading to outbreaks and recalls. Understanding the origins and pathways of contamination is imperative for prevention and mitigation. In this paper, we introduce FORCE Foodborne disease Outbreak and ReCall Event extraction from openweb). Our proposed model leverages a multi-tasking sequence labeling architecture in conjunction with transformer-based document embeddings. We have compiled a substantial annotated corpus comprising relevant articles published between 2011 and 2023 to train and evaluate the model. The dataset will be publicly released with the paper. The event detection model demonstrates fair accuracy in identifying food-related incidents and outbreaks associated with organizations, as assessed through cross-validation techniques.

pdf abs
Linguistically Informed Transformers for Text to American Sign Language Translation
Abhishek Varanasi | Manjira Sinha | Tirthankar Dasgupta
Proceedings of the Seventh Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2024)

In this paper we propose a framework for automatic translation of English text to American Sign Language (ASL) which leverages a linguistically informed transformer model to translate English sentences into ASL gloss sequences. These glosses are then associated with respective ASL videos, effectively representing English text in ASL. To facilitate experimentation, we create an English-ASL parallel dataset on banking domain.Our preliminary results demonstrated that the linguistically informed transformer model achieves a 97.83% ROUGE-L score for text-to-gloss translation on the ASLG-PC12 dataset. Furthermore, fine-tuning the transformer model on the banking domain dataset yields an 89.47% ROUGE-L score when fine-tuned on ASLG-PC12 + banking domain dataset. These results demonstrate the effectiveness of the linguistically informed model for both general and domain-specific translations. To facilitate parallel dataset generation in banking-domain, we choose ASL despite having limited benchmarks and data corpus compared to some of the other sign languages.

pdf abs
Exploring Language Models to Analyze Market Demand Sentiments from News
Tirthankar Dasgupta | Manjira Sinha
Proceedings of the 14th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis

Obtaining demand trends for products is an essential aspect of supply chain planning. It helps in generating scenarios for simulation before actual demands start pouring in. Presently, experts obtain this number manually from different News sources. In this paper, we have presented methods that can automate the information acquisition process. We have presented a joint framework that performs information extraction and sentiment analysis to acquire demand related information from business text documents. The proposed system leverages a TwinBERT-based deep neural network model to first extract product information for which demand is associated and then identify the respective sentiment polarity. The articles are also subjected to causal analytics, that, together yield rich contextual information about reasons for rise or fall of demand of various products. The enriched information is targeted for the decision-makers, analysts and knowledge workers. We have exhaustively evaluated our proposed models with datasets curated and annotated for two different domains namely, automobile sector and housing. The proposed model outperforms the existing baseline systems.

2023

pdf abs
Dy-poThon: A Bangla Sentence-Learning System for Children with Dyslexia
Dipshikha Podder | Manjira Sinha | Tirthankar Dasgupta | Anupam Basu
Proceedings of the 20th International Conference on Natural Language Processing (ICON)

The number of assistive technologies available for dyslexia in Bangla is low and most of them do not use multisensory teaching methods. As a solution, a computer-based audio-visual system Dy-poThon is proposed to teach sentence reading in Bangla. It incorporates the multisensory teaching method through three activities, listening, reading, and writing, checks the reading and writing ability of the user and tracks the response time. A criteria-based evaluation was conducted with 28 special educators to evaluate Dy-poThon. Content, efficiency, ease of use and aesthetics are evaluated using a standardised questionnaire. The result suggests that Dy-poThon is useful for teaching Bangla sentence-reading.

In this paper we have developed an open-source online computational framework that can be used by different research groups to conduct reading researches on Indian language texts. The framework can be used to develop a large annotated Indian language text comprehension data from different user based experiments. The novelty in this framework lies in the fact that it brings different empirical data-collection techniques for text comprehension under one roof. The framework has been customized specifically to address language particularities for Indian languages. It will also offer many types of automatic analysis on the data at different levels such as full text, sentence and word level. To address the subjectivity of text difficulty perception, the framework allows to capture user background against multiple factors. The assimilated data can be automatically cross referenced against varying strata of readers.

pdf
Text Readability in Hindi: A Comparative Study of Feature Performances Using Support Vectors
Manjira Sinha | Tirthankar Dasgupta | Anupam Basu
Proceedings of the 11th International Conference on Natural Language Processing

pdf
Influence of Target Reader Background and Text Features on Text Readability in Bangla: A Computational Approach
Manjira Sinha | Tirthankar Dasgupta | Anupam Basu
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers