Pritam Pal


2025

pdf bib
Top Ten from Lakhs: A Transformer-based Retrieval System for Identifying Previously Fact-Checked Claims across Multiple Languages
Srijani Debnath | Pritam Pal | Dipankar Das
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

The efficient identification of previously fact-checked claims across multiple languages is a challenging task. It can be time-consuming for professional fact-checkers even within a single language. It becomes much more difficult to perform manually when the claim and the fact-check may be in different languages. This paper presents a systematic approach for the retrieval of top-k relevant fact-checks for a given post in a monolingual and cross-lingual setup using two transformer-based fact-checked claim retrieval frameworks that share a common preprocessing pipeline but differ in their underlying encoder implementations: TIDE, a TensorFlow-based custom dual encoder applied to english-translated data, and PTEX, a PyTorch-based encoder operating on both english-translated and original-language inputs, and introduces a lightweight post-processing technique based on a textual feature: Keyword Overlap Count applied via reranking on top of the transformer-based frameworks. Training and evaluation on a large multilingual corpus show that the fine-tuned E5-Large-v2 model in the PTEX framework yields the best monolingual track performance, achieving an average Success@10 score of 0.8846 and the same framework model with post-processing technique achieves an average Success@10 score of 0.7393 which is the best performance in crosslingual track.

pdf bib
Toward Quantum-Enhanced Natural Language Understanding: Sarcasm and Claim Detection with QLSTM
Pritam Pal | Dipankar Das
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

Traditional machine learning (ML) and deep learning (DL) models have shown effectiveness in natural language processing (NLP) tasks, such as sentiment analysis. However, they often struggle with complex linguistic structures, such as sarcasm and implicit claims. This paper introduces a Quantum Long Short-Term Memory (QLSTM) framework for detecting sarcasm and identifying claims in text, aiming to enhance the analysis of complex sentences. We evaluate four approaches: (1) classical LSTM, (2) quantum framework using QLSTM, (3) voting ensemble combining classical and quantum LSTMs, and (4) hybrid framework integrating both types. The experimental results indicate that the QLSTM approach excels in sarcasm detection, while the voting framework performs best in claim identification.

pdf bib
Enhancing Textual Understanding: Automated Claim Span Identification in English, Hindi, Bengali, and CodeMix
Rudra Roy | Pritam Pal | Dipankar Das | Saptarshi Ghosh | Biswajit Paul
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era

Claim span identification, a crucial task in Natural Language Processing (NLP), aims to extract specific claims from texts. Such claim spans can be further utilized in various critical NLP applications, such as claim verification, fact-checking, and opinion mining, among others. The present work proposes a multilingual claim span identification framework for handling social media data in English, Hindi, Bengali, and CodeMixed texts, leveraging the strengths and knowledge of transformer-based pre-trained models. Our proposed framework efficiently identifies the contextual relationships between words and precisely detects claim spans across all languages, achieving a high F1 score and Jaccard score. The source code and datasets are available at: https://github.com/pritampal98/claim-span-multilingual

pdf bib
JU_NLP at SemEval-2025 Task 7: Leveraging Transformer-Based Models for Multilingual & Crosslingual Fact-Checked Claim Retrieval
Atanu Nayak | Srijani Debnath | Arpan Majumdar | Pritam Pal | Dipankar Das
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

Fact-checkers are often hampered by the sheer amount of online content that needs to be fact-checked. NLP can help them by retrieving already existing fact-checks relevant to the content being investigated. This paper presents a systematic approach for the retrieval of top-k relevant fact-checks for a given post in a monolingual and cross-lingual setup using transformer-based pre-trained models fine-tuned with a dual encoder architecture. By training and evaluating the shared task test dataset, our proposed best-performing framework achieved an average success@10 score of 0.79 and 0.62 for the retrieval of 10 fact-checks from the fact-check corpus against a post in monolingual and crosslingual track respectively.

2024

pdf bib
Human vs Machine: An Automated Machine-Generated Text Detection Approach
Urwah Jawaid | Rudra Roy | Pritam Pal | Srijani Debnath | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 21st International Conference on Natural Language Processing (ICON)

With the advancement of natural language processing (NLP) and sophisticated Large Language Models (LLMs), distinguishing between human-written texts and machine-generated texts is quite difficult nowadays. This paper presents a systematic approach to classifying machine-generated text from human-written text with a combination of the transformer-based model and textual feature-based post-processing technique. We extracted five textual features: readability score, stop word score, spelling and grammatical error count, unique word score and human phrase count from both human-written and machine-generated texts separately and trained three machine learning models (SVM, Random Forest and XGBoost) with these scores. Along with exploring traditional machine-learning models, we explored the BiLSTM and transformer-based distilBERT models to enhance the classification performance. By training and evaluating with a large dataset containing both human-written and machine-generated text, our best-performing framework achieves an accuracy of 87.5%.

pdf bib
Unveiling the Truth: A Deep Dive into Claim Identification Methods
Shankha Shubhra Das | Pritam Pal | Dipankar Das
Proceedings of the 38th Pacific Asia Conference on Language, Information and Computation