Symom Hossain Shohan
2026
CUETClashing at SemEval-2026 Task 1: Multilingual Joke Generation Under Lexical and Topical Constraints Using Small Instruction-Tuned LLMs
Madiha Ahmed Chowdhury | Lamia Khan | Faozia Fariha | Symom Hossain Shohan | Mohammed Moshiul Hoque
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Madiha Ahmed Chowdhury | Lamia Khan | Faozia Fariha | Symom Hossain Shohan | Mohammed Moshiul Hoque
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Generating humorous text is one of the most challenging tasks in natural language generation, as models must simultaneously juggle creativity, cultural understanding, and rules. To tackle these issues, this paper introduces our system for Subtask A of SemEval-2026 Task 1: MWAHAHA - Models Write Automatic Humor And Humans Annotate, which asks for single-sentence jokes with two rules—certain words must be included, and the joke must relate to a news headline—in English, Spanish, and Chinese. Our method uses instruction-tuned language models: Qwen2.5-3B-Instruct for English and Chinese, and Salamandra-2B-Instruct for Spanish, paired with language-specific prompts, special sampling for outputs, and a strong cleaning process after jokes are generated. Without additional task-specific training, our system generates jokes that adhere to the rules in all three languages, demonstrating that simple prompt design and small, instruction-tuned models can be a strong, efficient way to generate funny text across multiple languages.
CS_Metro at PsyDefDetect: Detecting Psychological Defense Mechanisms in Mental Health Dialogues with Summarization-Enhanced Transformer Ensembles
Oarisa Rebayet | Radiul Walee | Symom Hossain Shohan | Kawsar Ahmed | Mohammed Moshiul Hoque
Proceedings of the BioNLP 2026 (Shared Tasks)
Oarisa Rebayet | Radiul Walee | Symom Hossain Shohan | Kawsar Ahmed | Mohammed Moshiul Hoque
Proceedings of the BioNLP 2026 (Shared Tasks)
Detecting psychological defense mechanisms in supportive conversations is essential for assisting mental health practitioners. Natural language processing techniques are increasingly integral to such systems, enabling automated classification of defense levels to better understand help-seeker behavior and resistance patterns. In PsyDefDetect at BioNLP 2026, we address the task of nine-class defense level classification on the PSYDEFCONV corpus. We propose a three-stage pipeline combining LLM-based dialogue summarization, domain-specific transformer fine-tuning, and rule-based ensemble prediction. Additionally, we evaluate three mental health domain-specific transformers (Mental-BERT, Mental-RoBERTa, Mental-XLNet) alongside fine-tuned LLMs (Qwen3-4B, Qwen3-1.7B, Mistral-7B under different input conditions. Experimental results on the released test-set gold labels show that our ensemble approach achieves the best performance, reaching 34.69% macro F1 and surpassing the baseline by 4.69 percentage points. On the official PsyDefDetect Leaderboard 1 (labels 1–8), the submitted system achieved a Macro-F1 score of 23.46%, ranking 15th out of 21 teams, while on Leaderboard 2 (labels 0–8), it achieved 30.04%, securing 14th place. These findings demonstrate that domain-specific transformers substantially outperform generic LLM fine-tuning on this specialized clinical task.
2025
SemanticCuetSync@DravidianLangTech 2025: Multimodal Fusion for Hate Speech Detection - A Transformer Based Approach with Cross-Modal Attention
Md. Sajjad Hossain | Symom Hossain Shohan | Ashraful Islam Paran | Jawad Hossain | Mohammed Moshiul Hoque
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Md. Sajjad Hossain | Symom Hossain Shohan | Ashraful Islam Paran | Jawad Hossain | Mohammed Moshiul Hoque
Proceedings of the Fifth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
The rise of social media has significantly facilitated the rapid spread of hate speech. Detecting hate speech for content moderation is challenging, especially in low-resource languages (LRLs) like Telugu. Although some progress has been noticed in hate speech detection in Telegu concerning unimodal (text or image) in recent years, there is a lack of research on hate speech detection based on multimodal content detection (specifically using audio and text). In this regard, DravidianLangTech has arranged a shared task to address this challenge. This work explored three machine learning (ML), three deep learning (DL), and seven transformer-based models that integrate text and audio modalities using cross-modal attention for hate speech detection. The evaluation results demonstrate that mBERT achieved the highest F-1 score of 49.68% using text. However, the proposed multimodal attention-based approach with Whisper-small+TeluguBERT-3 achieved an F-1 score of 43 68%, which helped us achieve a rank of 3rd in the shared task competition.
2024
SemanticCUETSync at SemEval-2024 Task 1: Finetuning Sentence Transformer to Find Semantic Textual Relatedness
Md. Sajjad Hossain | Ashraful Islam Paran | Symom Hossain Shohan | Jawad Hossain | Mohammed Moshiul Hoque
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Md. Sajjad Hossain | Ashraful Islam Paran | Symom Hossain Shohan | Jawad Hossain | Mohammed Moshiul Hoque
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
Semantic textual relatedness is crucial to Natural Language Processing (NLP). Methodologies often exhibit superior performance in high-resource languages such as English compared to low-resource ones like Marathi, Telugu, and Spanish. This study leverages various machine learning (ML) approaches, including Support Vector Regression (SVR) and Random Forest, deep learning (DL) techniques such as Siamese Neural Networks, and transformer-based models such as MiniLM-L6-v2, Marathi-sbert, Telugu-sentence-bert-nli, and Roberta-bne-sentiment-analysis-es, to assess semantic relatedness across English, Marathi, Telugu, and Spanish. The developed transformer-based methods notably outperformed other models in determining semantic textual relatedness across these languages, achieving a Spearman correlation coefficient of 0.822 (for English), 0.870 (for Marathi), 0.820 (for Telugu), and 0.677 (for Spanish). These results led to our work attaining rankings of 22th (for English), 11th (for Marathi), 11th (for Telegu) and 14th (for Spanish), respectively.