Pritam Pal


2025

pdf bib
IndicClaimBuster: A Multilingual Claim Verification Dataset
Pritam Pal | Shyamal Krishna Jana | Dipankar Das
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics

The present article introduces **IndicClaimBuster**, a novel multilingual claim verification dataset comprising 9K claims and their corresponding evidence in English, Hindi, Bengali, and Hindi-English CodeMixed texts. The data set covers three key domains: politics, law and order, and health, to address the challenges of verifiable facts. Each claim was sourced from reputable Indian news portals and is accompanied by three pieces of evidence, two LLM-generated and one manually curated. Additionally, a separate attempt was conducted to generate refuted claims by employing an LLM. We further develop two frameworks: an unsupervised baseline and a two-stage pipeline that comprises evidence retrieval and veracity prediction modules. For retrieval, we fine-tuned SBERT models, with e5-base demonstrating superior average performance across languages, whereas for veracity prediction, multilingual transformers (mBERT, XLM-R, MuRIL, IndicBERTv2) were fine-tuned. Results indicate MuRIL and IndicBERTv2 excel in Indian languages, while XLM-R performs the best for CodeMix. Our work contributes a high-quality multilingual dataset and strong baseline methodologies, offering valuable resources for advancing automated claim verification in linguistically diverse and low-resource settings for Indian languages. The IndicClaimBuster dataset is available at: https://github.com/pritampal98/indic-claim-buster

pdf bib
JU_NLP at SemEval-2025 Task 7: Leveraging Transformer-Based Models for Multilingual & Crosslingual Fact-Checked Claim Retrieval
Atanu Nayak | Srijani Debnath | Arpan Majumdar | Pritam Pal | Dipankar Das
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

Fact-checkers are often hampered by the sheer amount of online content that needs to be fact-checked. NLP can help them by retrieving already existing fact-checks relevant to the content being investigated. This paper presents a systematic approach for the retrieval of top-k relevant fact-checks for a given post in a monolingual and cross-lingual setup using transformer-based pre-trained models fine-tuned with a dual encoder architecture. By training and evaluating the shared task test dataset, our proposed best-performing framework achieved an average success@10 score of 0.79 and 0.62 for the retrieval of 10 fact-checks from the fact-check corpus against a post in monolingual and crosslingual track respectively.

2024

pdf bib
Human vs Machine: An Automated Machine-Generated Text Detection Approach
Urwah Jawaid | Rudra Roy | Pritam Pal | Srijani Debnath | Dipankar Das | Sivaji Bandyopadhyay
Proceedings of the 21st International Conference on Natural Language Processing (ICON)

With the advancement of natural language processing (NLP) and sophisticated Large Language Models (LLMs), distinguishing between human-written texts and machine-generated texts is quite difficult nowadays. This paper presents a systematic approach to classifying machine-generated text from human-written text with a combination of the transformer-based model and textual feature-based post-processing technique. We extracted five textual features: readability score, stop word score, spelling and grammatical error count, unique word score and human phrase count from both human-written and machine-generated texts separately and trained three machine learning models (SVM, Random Forest and XGBoost) with these scores. Along with exploring traditional machine-learning models, we explored the BiLSTM and transformer-based distilBERT models to enhance the classification performance. By training and evaluating with a large dataset containing both human-written and machine-generated text, our best-performing framework achieves an accuracy of 87.5%.

pdf bib
Unveiling the Truth: A Deep Dive into Claim Identification Methods
Shankha Shubhra Das | Pritam Pal | Dipankar Das
Proceedings of the 38th Pacific Asia Conference on Language, Information and Computation