Hongyi Zhang

2026

McMaster NLP at SemEval-2026 Task 2: A Lightweight Multi-Feature System for Predicting Emotional Valence and Arousal over Time
Hongyi Zhang | Daniel Hu | Allison Lahnala
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

We present a lightweight, feature-based regression system for predicting \textbf{valence} (pleasantness) and \textbf{arousal} (activation) from longitudinal language data. The language data ranges from longer free-form ecological essays to short affect-word, organized by user and time, reflecting natural variation in affective expression and experience. Our approach combines three complementary signals: (i) sentence-level semantic embeddings, (ii) psycholinguistic category features capturing affect- and function-related word usage, (iii) similarity measures between the language data with archetypal sentences, and (iv) trainable user-embeddings to account for between-user differences. The resulting feature vector is passed to a multi-layer perceptron trained to jointly predict valence and arousal. Our design provides a strong and interpretable baseline by making it possible to isolate the contribution of semantic, psycholinguistic, similarity, and user-specific signals. We further analyze our model’s predictions to identify which feature groups are most informative and where errors are concentrated across users and input types.

pdf bib abs

Most existing work on mental health prediction from language focuses on isolated posts, overlooking temporal dynamics in longitudinal timelines. We present McMaster NLP’s system for the CLPsych 2026 Shared Task, which centers on modeling mental health dynamics in social media timelines using the MIND framework~\cite{atzil_slonim_2025_mind}. The task comprises: (1) identifying adaptive and maladaptive self-state components within posts, (2) detecting moments of change in well-being, and (3) generating structured summaries. For self-state prediction, we leverage LLM-generated archetypal representations of language use as semantic anchors within a dual-encoder architecture, enabling interpretable prediction of subelements and their intensities through alignment with prototypical expressions of psychological states. For temporal dynamics, we use BiLSTM-based sequence models to detect moments of change. For summarization, we employ a prompt-based LLM to generate grounded, structured summaries emphasizing causal interactions and temporal progression of self-states. Finally, we analyze model failure modes with respect to human evaluation and identify directions for reconciling the MIND framework with how state-assessment models encode meaning.

2024

pdf bib abs

Selective Prefix Tuning for Pre-trained Language Models
Hongyi Zhang | Zuchao Li | Ping Wang | Hai Zhao
Findings of the Association for Computational Linguistics: ACL 2024

The prevalent approach for optimizing pre-trained language models in downstream tasks is fine-tuning. However, it is both time-consuming and memory-inefficient. In response, a more efficient method called Prefix Tuning, which insert learnable vectors into each Transformer layers, has been proposed and proven effective. Recent investigations reveal that prefix tokens carry context-specific information, prompting the hypothesis that enhancing their specialization can improve model performance. To address this, we propose Selective Prefix Tuning (SPT), integrating a selective mechanism inspired by selective self-attention. Additionally, we introduce Selective Loss (SL) to encourage diversity in prefix tokens. Extensive experiments validate the effectiveness of SPT in sentence and token classification tasks. We contribute insight into understanding the role of prefix in model adaptation.

Co-authors

Zuchao Li 1

Kian Omoomi 1

Brian Miguel Pimentel 1

Aadi Sanghani 1

Akshay Krishna Sirigana 1

Vasudha Varadarajan 1

Ping Wang 1

Charles Welch 1

Hai Zhao 1

Venues

Fix author