Yang Liu
Other people with similar names:
Yang Liu
(Wilfrid Laurier University),
Yang Liu (刘扬)
(刘扬; Ph.D Purdue; ICSI, Dallas, Facebook, Liulishuo, Amazon),
Yang Liu
,
Yang Liu (刘洋)
(刘洋; ICT, Tsinghua, Beijing Academy of Artificial Intelligence),
Yang Liu
(Edinburgh Ph.D., Microsoft),
Yang Liu
(University of Helsinki),
Yang Liu
(Samsung Research Center Beijing),
Yang Liu
(Tianjin University, China),
Yang Liu
,
Yang Liu
(Microsoft Cognitive Services Research),
Yang Liu
(Univ. of Michigan, UC Santa Cruz),
Yang Liu
,
Yang Liu
(National University of Defense Technology),
Yang Liu
,
Yang Liu
,
Yang Liu
,
Yang Janet Liu
(Georgetown University; 刘洋),
Yang Liu (刘扬)
(Peking University),
Yang Liu
(The Chinese University of Hong Kong (Shenzhen)),
Yang Liu
,
Yang Liu
,
Yang Liu
(3M Health Information Systems),
Yang Liu
(Beijing Language and Culture University)
2025
pdf
bib
abs
Socratic Style Chain-of-Thoughts Help LLMs to be a Better Reasoner
Jiangbo Pei
|
Peiyu Liu
|
Wayne Xin Zhao
|
Aidong Men
|
Yang Liu
Findings of the Association for Computational Linguistics: ACL 2025
Synthetic data generation has emerged as a promising approach to enhance the reasoning capabilities of large language models. However, existing methods remain hindered by high costs—either through expensive API access or additional intermediate training—and are limited in their ability to generalize across different domains. To address these challenges, we propose a multi-agent debate framework based on the Socratic questioning strategy, abbreviated as SoDa. Distinguished from previous methods that prioritize data quantity, we highlight the wisdom of Socratic questioning in augmenting reasoning quality by deepening the thinking process to encourage exploration and broadening it to motivate self-reflection on each question. Combined with our efficient production pipeline, SoDa enables scaling while maintaining affordable costs. We use SoDa to generate diverse datasets for mathematics and code generation tasks with the Qwen2.5-7B-Instruct model, successfully fine-tuning a range of foundation models, from general-purpose ones to OpenAI o1-like ones. For mathematics, the experimental results show that SoDa outperforms the performance of existing datasets at the same scale, achieving improvements ranging from 1.3% to 13.5%. Remarkably, SoDa with 30K examples even surpasses the ScaleQuest dataset with 1000K samples, demonstrating significant efficiency. Our findings highlight the potential of SoDa as a universal, scalable, and cost-effective method for enhancing reasoning capabilities in large models across domains.
pdf
bib
abs
COSMIC: Generalized Refusal Direction Identification in LLM Activations
Vincent Siu
|
Nicholas Crispino
|
Zihao Yu
|
Sam Pan
|
Zhun Wang
|
Yang Liu
|
Dawn Song
|
Chenguang Wang
Findings of the Association for Computational Linguistics: ACL 2025
Large Language Models encode behaviors like refusal within their activation space, but identifying these behaviors remains challenging. Existing methods depend on predefined refusal templates detectable in output tokens or manual review. We introduce **COSMIC** (Cosine Similarity Metrics for Inversion of Concepts), an automated framework for direction selection that optimally identifies steering directions and target layers using cosine similarity, entirely independent of output text. COSMIC achieves steering effectiveness comparable to prior work without any prior knowledge or assumptions of a model’s refusal behavior such as the use of certain refusal tokens. Additionally, COSMIC successfully identifies refusal directions in adversarial scenarios and models with weak safety alignment, demonstrating its robustness across diverse settings.
pdf
bib
abs
Toward Optimal LLM Alignments Using Two-Player Games
Rui Zheng
|
Hongyi Guo
|
Zhihan Liu
|
Xiaoying Zhang
|
Yuanshun Yao
|
Xiaojun Xu
|
Zhaoran Wang
|
Zhiheng Xi
|
Tao Gui
|
Qi Zhang
|
Xuanjing Huang
|
Yang Liu
|
Hang Li
Findings of the Association for Computational Linguistics: EMNLP 2025
Alignment of large language models (LLM) is a process that ensures the model’s responses to user prompts align with human intentions and social values. This optimization typically relies on pre-collected prompts. The collection of these prompts often either requires careful human interventions or proves to be difficult to have a good coverage over all scenarios an LLM can improve over . To address this issue, we propose an alignment method based on a two-agent game, consisting of an adversarial agent and a defensive agent. The adversarial agent’s task is to generate prompts that expose the deficiencies of the defensive agent. At the same time, the defensive agent improves its performance on the prompts generated by the adversary based on feedback from the reward model. This iterative process is repeated to enhance the model’s performance. We theoretically demonstrate that, under mild assumptions, this iterative alignment process converges to a Nash equilibrium by both agents. Learning in this competitive environment results in policies with better generalization capabilities. We demonstrate the advantage of our framework using extensive experiments.
2023
pdf
bib
abs
T2IAT: Measuring Valence and Stereotypical Biases in Text-to-Image Generation
Jialu Wang
|
Xinyue Liu
|
Zonglin Di
|
Yang Liu
|
Xin Wang
Findings of the Association for Computational Linguistics: ACL 2023
*Warning: This paper contains several contents that may be toxic, harmful, or offensive.*In the last few years, text-to-image generative models have gained remarkable success in generating images with unprecedented quality accompanied by a breakthrough of inference speed. Despite their rapid progress, human biases that manifest in the training examples, particularly with regard to common stereotypical biases, like gender and skin tone, still have been found in these generative models. In this work, we seek to measure more complex human biases exist in the task of text-to-image generations. Inspired by the well-known Implicit Association Test (IAT) from social psychology, we propose a novel Text-to-Image Association Test (T2IAT) framework that quantifies the implicit stereotypes between concepts and valence, and those in the images. We replicate the previously documented bias tests on generative models, including morally neutral tests on flowers and insects as well as demographic stereotypical tests on diverse social attributes. The results of these experiments demonstrate the presence of complex stereotypical behaviors in image generations.