Xue Feng
2025
Enhancing LLM-Based Social Bot via an Adversarial Learning Framework
Fanqi Kong
|
Xiaoyuan Zhang
|
Xinyu Chen
|
Yaodong Yang
|
Song-Chun Zhu
|
Xue Feng
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Developing Large Language Model (LLM) agents that exhibit human-like behavior, encompassing not only individual heterogeneity rooted in unique user profiles but also adaptive response to socially connected neighbors, is a significant research challenge. Social media platforms, with their diverse user data and explicit social structures, provide an ideal testbed for such investigations. This paper introduces EvoBot, an **Evo**lving LLM-based social **Bot** that significantly enhances human-like generative capabilities through a novel adversarial learning framework. EvoBot is initialized by Supervised Fine-Tuning (SFT) on representative data from social media and then iteratively refines its generation of sophisticated, human-like content via Direct Preference Optimization (DPO). This refinement is guided by feedback from a co-adapting **Detector** which concurrently improves its ability to distinguish EvoBot from humans, thereby creating an increasingly challenging learning environment for EvoBot. Experiments demonstrate that EvoBot generates content aligned with diverse user profiles, increasingly bypassing the co-adapting Detector through human-like expression. Moreover, it exhibits strong social responsiveness, more accurately modeling real-world opinion dynamics and information spread in multi-agent simulations. The framework also yields a more robust Detector, underscoring its broader utility for both advanced agent development and related detection tasks. The code is available at https://anonymous.4open.science/r/EvoBot-036D.
Are the Values of LLMs Structurally Aligned with Humans? A Causal Perspective
Yipeng Kang
|
Junqi Wang
|
Yexin Li
|
Mengmeng Wang
|
Wenming Tu
|
Quansen Wang
|
Hengli Li
|
Tingjun Wu
|
Xue Feng
|
Fangwei Zhong
|
Zilong Zheng
Findings of the Association for Computational Linguistics: ACL 2025
As large language models (LLMs) become increasingly integrated into critical applications, aligning their behavior with human values presents significant challenges. Current methods, such as Reinforcement Learning from Human Feedback (RLHF), typically focus on a limited set of coarse-grained values and are resource-intensive. Moreover, the correlations between these values remain implicit, leading to unclear explanations for value-steering outcomes. Our work argues that a latent causal value graph underlies the value dimensions of LLMs and that, despite alignment training, this structure remains significantly different from human value systems. We leverage these causal value graphs to guide two lightweight value-steering methods: role-based prompting and sparse autoencoder (SAE) steering, effectively mitigating unexpected side effects. Furthermore, SAE provides a more fine-grained approach to value steering. Experiments on Gemma-2B-IT and Llama3-8B-IT demonstrate the effectiveness and controllability of our methods.
Search
Fix author
Co-authors
- Xinyu Chen (陈欣雨) 1
- Yipeng Kang 1
- Fanqi Kong 1
- Yexin Li 1
- Hengli Li 1
- show all...