Shu Yang

Other people with similar names: Shu Yang (University of British Columbia)


2025

pdf bib
Understanding How Value Neurons Shape the Generation of Specified Values in LLMs
Yi Su | Jiayi Zhang | Shu Yang | Xinhai Wang | Lijie Hu | Di Wang
Findings of the Association for Computational Linguistics: EMNLP 2025

Rapid integration of large language models (LLMs) into societal applications has intensified concerns about their alignment with universal ethical principles, as their internal value representations remain opaque despite behavioral alignment advancements. Current approaches struggle to systematically interpret how values are encoded in neural architectures, limited by datasets that prioritize superficial judgments over mechanistic analysis. We introduce ValueLocate, a mechanistic interpretability framework grounded in the Schwartz Values Survey, to address this gap. Our method first constructs ValueInsight, a dataset that operationalizes four dimensions of universal value through behavioral contexts in the real world. Leveraging this dataset, we develop a neuron identification method that calculates activation differences between opposing value aspects, enabling precise localization of value-critical neurons without relying on computationally intensive attribution methods. Our proposed validation method demonstrates that targeted manipulation of these neurons effectively alters model value orientations, establishing causal relationships between neurons and value representations. This work advances the foundation for value alignment by bridging psychological value frameworks with neuron analysis in LLMs.

pdf bib
Can Large Language Models Identify Implicit Suicidal Ideation? An Empirical Evaluation
Tong Li | Shu Yang | Junchao Wu | Jiyao Wei | Lijie Hu | Mengdi Li | Derek F. Wong | Joshua R. Oltmanns | Di Wang
Findings of the Association for Computational Linguistics: EMNLP 2025

Suicide remains a major global mental health challenge, and early intervention hinges on recognizing signs of suicidal ideation. In private conversations, such ideation is often expressed in subtle or conflicted ways, making detection especially difficult. Existing data sets are mainly based on public help-seeking platforms such as Reddit, which fail to capture the introspective and ambiguous nature of suicidal ideation in more private contexts. To address this gap, we introduce , a novel dataset of 1,200 test cases simulating implicit suicidal ideation within psychologically rich dialogue scenarios. Each case is grounded in psychological theory, combining the Death/Suicide Implicit Association Test (D/S-IAT) patterns, expanded suicidal expressions, cognitive distortions, and contextual stressors. In addition, we propose a psychology-guided evaluation framework to assess the ability of LLMs to identify implicit suicidal ideation through their responses. Experiments with eight widely used LLMs across varied prompting conditions reveal that current models often struggle significantly to recognize implicit suicidal ideation. Our findings highlight the urgent need for more clinically grounded evaluation frameworks and design practices to ensure the safe use of LLMs in sensitive support systems.