Shuva Dey

2026

CUET_DiagNLP at #SMM4H-HeaRD 2026: Per-Axis TNM Staging from Pathology Reports and Opioid Impact Span Detection from Social Media
Shuva Dey | Priyangshu Barua | Mohammad Ashfak Habib
Proceedings of the 11th Social Media Mining for Health Research and Applications (SMM4H-HeaRD 2026) Workshop and Shared Tasks

In this paper, we describe systems for two #SMM4H-HeaRD 2026 shared tasks. Task 6 asks for per-axis TNM cancer staging from free-text TCGA pathology reports under severe label imbalance and long-document constraints. We fine-tune GatorTron-base separately on each axis using Focal loss with class weights and a pooled [CLS]–mean representation, reaching macro F1 of 0.700 (T), 0.774 (N), and 0.640 (M) on test set 2 against a baseline of 0.454, 0.591, and 0.554 respectively. Task 7 asks for span-level detection of opioid-related ClinicalImpacts and SocialImpacts in first-person Reddit posts. We combine DeBERTa-large and PubMedBERT (two seeds each) in a uniform-weight ensemble with boundary-aware loss, entity-replacement augmentation, and a first-person post filter, achieving strict F1 of 0.51 and relaxed F1 of 0.60, above both the task mean (0.46 / 0.55) and median (0.48 / 0.58).

pdf bib abs

Cuet_Neural_Navigators@DravidianLangTech 2026: Depression Detection from Malayalam and Tamil Speech using Self-Supervised Acoustic Models
Shuva Dey | Abir Dey | Sha Newaz Mahmud | Hasan Murad
Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages

Depression detection from speech aims to findsigns of depression using behavioral signals.This approach enables early mental healthscreening and makes it scalable. However, thetask is tough because of subtle acoustic cues,differences among speakers, and language-specific patterns. In this work, we introduceour system for the Shared Task on DepressionDetection in Dravidian Languages (DD-DL)at DravidianLangTech@ACL 2026. We fo-cus on speech in Tamil and Malayalam. Weexplore pretrained self-supervised speech en-coders, including HuBERT, XLS-R, and Whis-per, to identify acoustic patterns related to de-pression directly from raw audio. Our methodcombines these models through ensembling tocapture different acoustic features. The ex-periments use stratified evaluation and cross-lingual analysis to check how well the mod-els work across languages. Results show thatpretrained acoustic representations effectivelycapture vocal features of depression, achiev-ing Macro-F1 scores of 0.9058 for Tamil and0.9396 for Malayalam. However, cross-lingualtransfer faces challenges because of phoneticand prosodic differences.

Co-authors

Venues

Fix author