Junlin Wu

2026

Large language model (LLM) personalization typically relies on modeling each user in isolation, conditioning on their historical interactions to adapt model behavior. However, this user-centric formulation overlooks the collective knowledge shared across users, limiting generalization for users with sparse histories and amplifying overfitting for those with highly skewed behaviors. We argue that effective personalization requires leveraging both individual preferences and population-level patterns. To this end, we propose LoGo, a Local–Global knowledge framework that augments user-specific signals with a global knowledge encoding collective behavioral trends. LoGo models global knowledge through a temporally evolving process that captures how population-wide preferences change over time, and a community-aware structure that organizes users into coherent groups with shared interests. To balance potentially conflicting local and global signals, LoGo employs a mediator module that adaptively fuses the two knowledge sources. Experiments on five personalization benchmarks show that LoGo consistently enhances personalization quality, outperforming existing methods by improving generalization in users with limited histories and mitigating bias in users with abundant histories. These results demonstrate the central role of collective knowledge in advancing LLM personalization. Our code is publicly available at https://github.com/Zehong-Wang/LoGo.

2024

pdf bib abs

RLHFPoison: Reward Poisoning Attack for Reinforcement Learning with Human Feedback in Large Language Models
Jiongxiao Wang | Junlin Wu | Muhao Chen | Yevgeniy Vorobeychik | Chaowei Xiao
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Reinforcement Learning with Human Feedback (RLHF) is a methodology designed to align Large Language Models (LLMs) with human preferences, playing an important role in LLMs alignment. Despite its advantages, RLHF relies on human annotators to rank the text, which can introduce potential security vulnerabilities if any adversarial annotator (i.e., attackers) manipulates the ranking score by up-ranking any malicious text to steer the LLM adversarially. To assess the red-teaming of RLHF against human preference data poisoning, we propose RankPoison, a poisoning attack method on candidates’ selection of preference rank flipping to reach certain malicious behaviors (e.g., generating longer sequences, which can increase the computational cost). With poisoned dataset generated by RankPoison, we can perform poisoning attacks on LLMs to generate longer tokens without hurting the original safety alignment performance. Moreover, applying RankPoison, we also successfully implement a backdoor attack where LLMs can generate longer answers under questions with the trigger word. Our findings highlight critical security challenges in RLHF, underscoring the necessity for more robust alignment methods for LLMs.

2018

pdf bib abs

CARER: Contextualized Affect Representations for Emotion Recognition
Elvis Saravia | Hsien-Chi Toby Liu | Yen-Hao Huang | Junlin Wu | Yi-Shin Chen
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Emotions are expressed in nuanced ways, which varies by collective or individual experiences, knowledge, and beliefs. Therefore, to understand emotion, as conveyed through text, a robust mechanism capable of capturing and modeling different linguistic nuances and phenomena is needed. We propose a semi-supervised, graph-based algorithm to produce rich structural descriptors which serve as the building blocks for constructing contextualized affect representations from text. The pattern-based representations are further enriched with word embeddings and evaluated through several emotion recognition tasks. Our experimental results demonstrate that the proposed method outperforms state-of-the-art techniques on emotion recognition tasks.

Co-authors

Zheli Liu 1

Elvis Saravia 1

Zhaoxuan Tan 1

Yevgeniy Vorobeychik 1

Venues

Fix author