Yi Pan

2025

When aligning large language models (LLMs), their performance across various tasks (such as being helpful, harmless, and honest) is heavily influenced by the composition of the training data. However, it is difficult to determine what mixture of data should be used to produce a model with strong performance across all tasks. Existing approaches rely on large ablation studies, heuristics, or human intuition, though these can be prohibitively expensive and suboptimal. We study this problem in the context of preference optimization via DPO and propose a novel and theoretically justified algorithm, AutoMixAlign (AMA), that adaptively mixes datasets during LLM training to balance performance across multiple tasks. AMA first trains specialist models for each task to determine losses that corresponding to strong task performance. Next, AMA trains a generalist model using a novel minimax optimization that prioritizes tasks for which generalist model losses are furthest from specialist model losses. We introduce two algorithms to optimize this problem: (1) AMA-R adaptively reweights the objective to prioritize tasks, and (2) AMA-S adaptively adjusts how much data is sampled from each task to prioritize tasks. Both algorithms achieve a convergence rate of O(1/√T) in the convex case. AMA-R’s convergence result immediately follows from Sagawa et. al, 2019, and we provide a convergence proof for AMA-S using techniques from online learning such as EXP3 (Auer et. al, 2002). We evaluate AMA on several multitask alignment setups, and observe that AMA outperforms the standard alignment approach which simply optimizes the total loss across all tasks and also outperforms model-merging methods.

2021

Traditional goal-oriented dialogue systems rely on various components such as natural language understanding, dialogue state tracking, policy learning and response generation. Training each component requires annotations which are hard to obtain for every new domain, limiting scalability of such systems. Similarly, rule-based dialogue systems require extensive writing and maintenance of rules and do not scale either. End-to-End dialogue systems, on the other hand, do not require module-specific annotations but need a large amount of data for training. To overcome these problems, in this demo, we present Alexa Conversations, a new approach for building goal-oriented dialogue systems that is scalable, extensible as well as data efficient. The components of this system are trained in a data-driven manner, but instead of collecting annotated conversations for training, we generate them using a novel dialogue simulator based on a few seed dialogues and specifications of APIs and entities provided by the developer. Our approach provides out-of-the-box support for natural conversational phenomenon like entity sharing across turns or users changing their mind during conversation without requiring developers to provide any such dialogue flows. We exemplify our approach using a simple pizza ordering task and showcase its value in reducing the developer burden for creating a robust experience. Finally, we evaluate our system using a typical movie ticket booking task integrated with live APIs and show that the dialogue simulator is an essential component of the system that leads to over 50% improvement in turn-level action signature prediction accuracy.

2004

pdf bib
Sentence Compression for Automated Subtitling: A Hybrid Approach
Vincent Vandeghinste | Yi Pan
Text Summarization Branches Out

Co-authors

Venues

Fix author