Hongwei Du


2025

Despite the recent efforts from the NLP community, balancing the training budget, downstream performance, and general capabilities of large language models (LLM) remains a challenge in many applications. Training the entire model for downstream tasks is expensive, and could easily result in catastrophic forgetting. Parameter-efficient fine-tuning (PEFT) could reduce the training cost, but it still suffers from forgetting, and limits the learning on the downstream tasks. To address the aforementioned issues, we propose a novel mixture of expert (MoE) framework based on Soft LoRA and Identity Mixture (SLIM). SLIM allows dynamic routing between LoRA adapters and identity layers, thus enabling the bypass of LoRA adapters to suppress forgetting of general capacity. We adopt weight yielding with sliding clustering for better out-of-domain distinguish to enhance the routing. We also convert the mixture of LoRA adapters to the model merging formulation and introduce dynamic merging with its fast implementation for LoRA adapters to keep the general capabilities. Extensive experiments demonstrate that the proposed SLIM is comparable to the state-of-the-art PEFT approaches on the downstream tasks while achieving the leading performance in mitigating catastrophic forgetting. We plan to open-source the code upon publication.

2021

Generating long text conditionally depending on the short input text has recently attracted more and more research efforts. Most existing approaches focus more on introducing extra knowledge to supplement the short input text, but ignore the coherence issue of the generated texts. To address aforementioned research issue, this paper proposes a novel two-stage approach to generate coherent long text. Particularly, we first build a document-level path for each output text with each sentence embedding as its node, and a revised self-organising map (SOM) is proposed to cluster similar nodes of a family of document-level paths to construct the directed semantic graph. Then, three subgraph alignment methods are proposed to extract the maximum matching paths or subgraphs. These directed subgraphs are considered to well preserve extra but relevant content to the short input text, and then they are decoded by the employed pre-trained model to generate coherent long text. Extensive experiments have been performed on three real-world datasets, and the promising results demonstrate that the proposed approach is superior to the state-of-the-art approaches w.r.t. a number of evaluation criteria.