CheckSent-BN: A Bengali Multi-Task Dataset for Claim Checkworthiness and Sentiment Classification from News Headlines

Pritam Pal, Dipankar Das


Abstract
This paper presents **CheckSent-BN** (Claim **Check**worthiness and **Sen**timent Classification in **B**engali **N**ews Headline), a novel multi-task dataset in Bengali comprising approximately 11.5K news headlines annotated for two critical natural language processing (NLP) tasks: claim checkworthiness detection and sentiment classification. To address the lack of high-quality annotated resources in Bengali, we employ a cost-effective annotation strategy that utilizes three large language models (GPT-4o-mini, GPT-4.1-mini, and Llama-4), followed by majority voting and manual verification to ensure label consistency. We provide benchmark results using multilingual and Bengali-focused transformer models under both single-task and multi-task learning (MTL) frameworks. Experimental results show that IndicBERTv2, BanglaBERT, and mDeBERTa model-based frameworks outperform other multilingual models, with IndicBERTv2 achieving the best overall performance in the MTL setting. CheckSent-BN establishes the first comprehensive benchmark for joint claim checkworthiness and sentiment classification in Bengali news headlines, offering a valuable resource for advancing misinformation detection and sentiment-aware analysis in low-resource languages. The CheckSent-BN dataset is available at: https://github.com/pritampal98/check-sent-bn
Anthology ID:
2025.banglalp-1.10
Volume:
Proceedings of the Second Workshop on Bangla Language Processing (BLP-2025)
Month:
December
Year:
2025
Address:
Mumbai, India
Editors:
Firoj Alam, Sudipta Kar, Shammur Absar Chowdhury, Naeemul Hassan, Enamul Hoque Prince, Mohiuddin Tasnim, Md Rashad Al Hasan Rony, Md Tahmid Rahman Rahman
Venues:
BanglaLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
119–130
Language:
URL:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.banglalp-1.10/
DOI:
Bibkey:
Cite (ACL):
Pritam Pal and Dipankar Das. 2025. CheckSent-BN: A Bengali Multi-Task Dataset for Claim Checkworthiness and Sentiment Classification from News Headlines. In Proceedings of the Second Workshop on Bangla Language Processing (BLP-2025), pages 119–130, Mumbai, India. Association for Computational Linguistics.
Cite (Informal):
CheckSent-BN: A Bengali Multi-Task Dataset for Claim Checkworthiness and Sentiment Classification from News Headlines (Pal & Das, BanglaLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.banglalp-1.10.pdf