Nahid Hossain

2026

AlienAnnotators at PsyDefDetect: What Lies Between the Lines: Probing Lightweight Open-Source LLMs for Psychological Defense Mechanism Detection
Siam Karip | Nahid Hossain
Proceedings of the BioNLP 2026 (Shared Tasks)

Detecting psychological defense mechanisms in therapy dialogue is a clinically valuable but computationally underexplored task. We present our systematic analysis for PsyDefDetect, a shared task at BioNLP@ACL 2026, which frames defense detection as a nine-class utterance-level classification problem based on the Defense Mechanism Rating Scale (DMRS). We systematically evaluate six open-source, instruction-tuned small language models (SLMs, = 9B parameters) in zero-shot and fine-tuning settings, and compare a clinically-grounded prompt against the organizer-provided baseline. Our official submission achieved 59.96% accuracy and 16.28% Macro F1. Post-submission experiments show that fine-tuning combined with 5-fold cross-validation and logit averaging ensemble substantially improves performance, with the best configuration reaching 34.59% Macro F1 and 65.25% accuracy. We find that clinically-grounded prompts outperform bare label definitions, model scale does not consistently improve zero-shot performance, and fine-tuning dramatically recovers even collapsed zero-shot models. Certain defense tiers remain persistently difficult across all settings, pointing to clinical ambiguity at tier boundaries as a more fundamental bottleneck than data imbalance alone.

2025

pdf bib abs

Context Minimization for Resource-Constrained Text Classification: Optimizing Performance-Efficiency Trade-offs through Linguistic Features
Nahid Hossain | Md Faisal Kabir
Findings of the Association for Computational Linguistics: EMNLP 2025

Pretrained language models have transformed text classification, yet their computational demands often render them impractical for resource-constrained settings. We propose a linguistically-grounded framework for context minimization that leverages theme-rheme structure to preserve critical classification signals while reducing input complexity. Our approach integrates positional, syntactic, semantic, and statistical features, guided by functional linguistics, to identify optimal low-context configurations. We present a methodical iterative feature exploration protocol across 6 benchmarks, including our novel CMLA11 dataset. Results demonstrate substantial efficiency gains: 69-75% reduction in GPU memory, 81-87% decrease in training time, and 82-88% faster inference. Despite these resource savings, our configurations maintain near-parity with full-length inputs, with F1 (macro) reductions averaging just 1.39-3.10%. Statistical significance testing confirms minimal practical impact, with some configurations outperforming the baseline. SHAP analysis reveals specific feature subsets contribute most significantly across datasets, and these recurring configurations offer transferable insights, reducing the need for exhaustive feature exploration. Our method also yields remarkable data compression (72.57% average reduction, reaching 92.63% for longer documents). Ablation studies confirm synergistic feature contributions, establishing our context minimization as an effective solution for resource-efficient text classification with minimal performance trade-offs.

Co-authors

Md Faisal Kabir 1
Siam Karip 1

Venues

Fix author