Xi Chen

Other people with similar names: Xi Chen, Xi Chen, Xi Chen, Xi Chen

Unverified author pages with similar names: Xi Chen


2025

Generative replay has proven effective in addressing the catastrophic forgetting issue of continual learning (CL) in natural language processing (NLP). However, relying on a single task-specific token or prompt often falls short in generating pseudo-samples that accurately reflect the true data distribution. This leads to issues of semantic inconsistency and scale inconsistency.To tackle these challenges, we propose a Prototype Conditioned Generative Replay (PCGR) method, which enhances generative reply by incorporating task-level statistics through a Prototype Conditioned Variational Autoencoder (PCVAE).Specifically, task-level embedding statistics are stored as prototypes for each old task. When a new task is introduced, PCVAE draws samples from task-specific prototype-based distributions to generate pseudo-samples.By incorporating the prototype, the generated pseudo-samples are both more representative and sufficiently diverse to reflect the real data distribution.Furthermore, as previously stored prototypes may become outdated due to evolving model parameters, we propose a Prototype Shift Estimation (PSE) to adjust for these changes.Experiments on NLP tasks across two different scenarios show that PCGR outperforms previous state-of-the-art (SOTA) methods.
Continual learning is vital for task-oriented dialogue systems (ToDs), and AdapterCL, equipped with residual adapters, has proven effectiveness in this domain. However, its performance is limited by training separate adapters for each task, preventing global knowledge sharing. To address this, we propose **Task-wrapped Continual Learning (TCL)**, a novel framework that employs **Task-Wrapped Adapters (TWAs)**, to simultaneously learn both global and task-specific information through parameter sharing. TCL leverages task-conditioned hypernetworks to transfer global knowledge across tasks, enabling TWAs to start from more informed initialization, efficiently learning task-specific details while reducing model parameters. Additionally, the simple, linear structure of both hypernetworks and TWAs ensure stable training, with task-free inference supported through effective loss utilization. Across 37 ToD domains, TCL consistently outperforms AdapterCL, significantly reducing forgetting. Remarkably, by setting the task embedding dimension to 1, TCL achieves a 4.76% improvement over AdapterCL while using only 46% of the parameters. These findings position TWA as a lightweight, powerful alternative to traditional adapters, offering a promising solution for continual learning in ToDs. The code is availableat https://github.com/cloversjtu/TCL.