Sajib Bhattacharjee

2025

pdf bib abs
CUET_SR34 at at CQs-Gen 2025: Critical Question Generation via Few-Shot LLMs – Integrating NER and Argument Schemes
Sajib Bhattacharjee | Tabassum Basher Rashfi | Samia Rahman | Hasan Murad
Proceedings of the 12th Argument mining Workshop

Critical Question Generation (CQs-Gen) improves reasoning and critical thinking skills through Critical Questions (CQs), which identify reasoning gaps and address misinformation in NLP, especially as LLM-based chat systems are widely used for learning and may encourage superficial learning habits. The Shared Task on Critical Question Generation, hosted at the 12th Workshop on Argument Mining and co-located in ACL 2025, has aimed to address these challenges. This study proposes a CQs-Gen pipeline using Llama-3-8B-Instruct-GGUF-Q8_0 with few-shot learning, integrating text simplification, NER, and argument schemes to enhance question quality. Through an extensive experiment testing without training, fine-tuning with PEFT using LoRA on 10% of the dataset, and few-shot fine-tuning (using five examples) with an 8-bit quantized model, we demonstrate that the few-shot approach outperforms others. On the validation set, 397 out of 558 generated CQs were classified as Useful, representing 71.1% of the total. In contrast, on the test set, 49 out of 102 generated CQs, accounting for 48% of the total, were classified as Useful following evaluation through semantic similarity and manual assessments.

pdf bib abs
Team ML_Forge@DravidianLangTech 2025: Multimodal Hate Speech Detection in Dravidian Languages
Adnan Faisal | Shiti Chowdhury | Sajib Bhattacharjee | Udoy Das | Samia Rahman | Momtazul Arefin Labib | Hasan Murad
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

Ensuring a safe and inclusive online environment requires effective hate speech detection on social media. While detection systems have significantly advanced for English, many regional languages, including Malayalam, Tamil and Telugu, remain underrepresented, creating challenges in identifying harmful content accurately. These languages present unique challenges due to their complex grammar, diverse dialects, and frequent code-mixing with English. The rise of multimodal content, including text and audio, adds further complexity to detection tasks. The shared task “Multimodal Hate Speech Detection in Dravidian Languages: DravidianLangTech@NAACL 2025” has aimed to address these challenges. A Youtube-sourced dataset has been provided, labeled into five categories: Gender (G), Political (P), Religious (R), Personal Defamation (C) and Non-Hate (NH). In our approach, we have used mBERT, T5 for text and Wav2Vec2 and Whisper for audio. T5 has performed poorly compared to mBERT, which has achieved the highest F1 scores on the test dataset. For audio, Wav2Vec2 has been chosen over Whisper because it processes raw audio effectively using self-supervised learning. In the hate speech detection task, we have achieved a macro F1 score of 0.2005 for Malayalam, ranking 15th in this task, 0.1356 for Tamil and 0.1465 for Telugu, with both ranking 16th in this task.

Co-authors

Adnan Faisal 1

Momtazul Arefin Labib 1

Venues

Fix author