Yang Liu
UC Santa Cruz
Other people with similar names:
Yang Janet Liu (Georgetown University; 刘洋),
Yang Liu (Tsinghua),
Yang Liu (Fudan),
Yang Liu (BIGAI),
Yang Liu,
Yang Liu (Hunan),
Yang Liu,
Yang Liu (3M Health Information Systems),
Yang Liu,
Yang Liu,
Yang Liu (South China University of Technology),
Yang Liu,
Yang Liu,
Yang Liu (NTU),
Yang Liu (Sun Yat-sen University),
Yang Liu (North Carolina Central University),
Yang Liu (Beijing Language and Culture University),
Yang Liu (National University of Defense Technology),
Yang Liu (Edinburgh Ph.D., Microsoft),
Yang Liu (University of Helsinki),
Yang Liu (The Chinese University of Hong Kong (Shenzhen)),
Yang Liu (刘扬) (刘扬; Ph.D Purdue; ICSI, Dallas, Facebook, Liulishuo, Amazon),
Yang Liu (刘洋) (刘洋; ICT, Tsinghua, Beijing Academy of Artificial Intelligence),
Yang Liu (Microsoft Cognitive Services Research),
Yang Liu (刘扬) (Peking University),
Yang Liu (Samsung Research Center Beijing),
Yang Liu (Tianjin University, China),
Yang Liu (Univ. of Michigan, UC Santa Cruz),
Yang Liu (Wilfrid Laurier University)
Unverified author pages with similar names:
Yang Liu
2026
The rapid advancement of large language models (LLMs) has led to growing interest in using synthetic data to train future models. However, this creates a self-consuming retraining loop, where models are trained on their own outputs and may cause performance drops and induce emerging biases. In real-world applications, previously deployed LLMs may influence the data they generate, leading to a dynamic system driven by user feedback. For example, if a model continues to underserve users from a group, less query data will be collected from this particular demographic of users. In this study, we introduce the concept of Self-Consuming Performative Loop (SCPL) and investigate the role of synthetic data in shaping bias during these dynamic iterative training processes under controlled performative feedback. This controlled setting is motivated by the inaccessibility of real-world user preference data from dynamic production systems, and enables us to isolate and analyze feedback-driven bias evolution in a principled manner. We focus on two types of loops, including the typical retraining setting and the incremental fine-tuning setting, which is largely underexplored. Through experiments on three real-world tasks, we find that the performative loop increases preference bias and decreases disparate bias. We design a reward-based rejection sampling strategy to mitigate the bias, moving towards more trustworthy self-improving systems. The code is available at https://github.com/UCSC-REAL/SCPL.git.
Proactive task-oriented agents must autonomously anticipate user needs, identify actionable opportunities, and trigger software actions at appropriate moments—fundamentally shifting from reactive systems that await explicit instructions. However, existing approaches lack generalizable end-to-end solutions for measuring and optimizing such anticipatory behaviors.This paper introduces ProActor, a unified framework for conversational task scheduling that integrates: (1) a domain-agnostic automated annotation methodology that enables scalable proactiveness reinforcement learning (RL) by generating full opportunity time windows instead of rigid point labels, (2) systematic proactiveness metrics capturing both timing quality and reference action alignment, and (3) RL optimization using GRPO with various reward designs. Our insight is that RULER-based rewards with proactiveness rubrics are crucial for improving timing quality, and that proactiveness optimization enabled by stage-aware composite rewards is key to balancing timing quality and reference action alignment.Furthermore, we introduce ART-F, an adaptive RL framework that combines request-adaptive inference clusters with asynchronous training for better GPU utilization, enabling LoRA training of 4-bit Qwen2.5-14B-ProActor-Q4 models on 4×H200 and 8×H100 GPUs with substantial speedups. Experiments on two newly auto-annotated datasets demonstrate significant improvements in proactive timing while maintaining action consistency comparable to state-of-the-art baselines. Ablations validate the effectiveness of distinct composite reward variations.
2025
Large Language Models encode behaviors like refusal within their activation space, but identifying these behaviors remains challenging. Existing methods depend on predefined refusal templates detectable in output tokens or manual review. We introduce **COSMIC** (Cosine Similarity Metrics for Inversion of Concepts), an automated framework for direction selection that optimally identifies steering directions and target layers using cosine similarity, entirely independent of output text. COSMIC achieves steering effectiveness comparable to prior work without any prior knowledge or assumptions of a model’s refusal behavior such as the use of certain refusal tokens. Additionally, COSMIC successfully identifies refusal directions in adversarial scenarios and models with weak safety alignment, demonstrating its robustness across diverse settings.
Alignment of large language models (LLM) is a process that ensures the model’s responses to user prompts align with human intentions and social values. This optimization typically relies on pre-collected prompts. The collection of these prompts often either requires careful human interventions or proves to be difficult to have a good coverage over all scenarios an LLM can improve over . To address this issue, we propose an alignment method based on a two-agent game, consisting of an adversarial agent and a defensive agent. The adversarial agent’s task is to generate prompts that expose the deficiencies of the defensive agent. At the same time, the defensive agent improves its performance on the prompts generated by the adversary based on feedback from the reward model. This iterative process is repeated to enhance the model’s performance. We theoretically demonstrate that, under mild assumptions, this iterative alignment process converges to a Nash equilibrium by both agents. Learning in this competitive environment results in policies with better generalization capabilities. We demonstrate the advantage of our framework using extensive experiments.
Synthetic data generation has emerged as a promising approach to enhance the reasoning capabilities of large language models. However, existing methods remain hindered by high costs—either through expensive API access or additional intermediate training—and are limited in their ability to generalize across different domains. To address these challenges, we propose a multi-agent debate framework based on the Socratic questioning strategy, abbreviated as SoDa. Distinguished from previous methods that prioritize data quantity, we highlight the wisdom of Socratic questioning in augmenting reasoning quality by deepening the thinking process to encourage exploration and broadening it to motivate self-reflection on each question. Combined with our efficient production pipeline, SoDa enables scaling while maintaining affordable costs. We use SoDa to generate diverse datasets for mathematics and code generation tasks with the Qwen2.5-7B-Instruct model, successfully fine-tuning a range of foundation models, from general-purpose ones to OpenAI o1-like ones. For mathematics, the experimental results show that SoDa outperforms the performance of existing datasets at the same scale, achieving improvements ranging from 1.3% to 13.5%. Remarkably, SoDa with 30K examples even surpasses the ScaleQuest dataset with 1000K samples, demonstrating significant efficiency. Our findings highlight the potential of SoDa as a universal, scalable, and cost-effective method for enhancing reasoning capabilities in large models across domains.
2023
*Warning: This paper contains several contents that may be toxic, harmful, or offensive.*In the last few years, text-to-image generative models have gained remarkable success in generating images with unprecedented quality accompanied by a breakthrough of inference speed. Despite their rapid progress, human biases that manifest in the training examples, particularly with regard to common stereotypical biases, like gender and skin tone, still have been found in these generative models. In this work, we seek to measure more complex human biases exist in the task of text-to-image generations. Inspired by the well-known Implicit Association Test (IAT) from social psychology, we propose a novel Text-to-Image Association Test (T2IAT) framework that quantifies the implicit stereotypes between concepts and valence, and those in the images. We replicate the previously documented bias tests on generative models, including morally neutral tests on flowers and insects as well as demographic stereotypical tests on diverse social attributes. The results of these experiments demonstrate the presence of complex stereotypical behaviors in image generations.