Yifu Gao

2026

Full-parameter fine-tuning of large language models is constrained by substantial GPU memory demands. Low-rank adaptation methods mitigate this challenge by updating only a subset of parameters. However, these approaches often limit model expressiveness and yield lower performance than full-parameter fine-tuning. Layer-wise fine-tuning methods have emerged as an alternative, enabling memory-efficient training through static layer importance sampling strategies. However, these methods overlook variations in layer importance across tasks and training stages, resulting in suboptimal performance on downstream tasks. To address these limitations, we propose GRASS, a gradient-based adaptive layer-wise importance sampling framework. GRASS utilizes mean gradient norms as a task-aware and training-stage-aware metric for estimating layer importance. Furthermore, GRASS adaptively adjusts layer sampling probabilities through an adaptive training strategy. We also introduce a layer-wise optimizer state offloading mechanism to further reduce memory usage while maintaining comparable training throughput. Extensive experiments across multiple models and benchmarks demonstrate that GRASS consistently outperforms state-of-the-art methods, achieving an average accuracy improvement of up to 4.38 points and reducing memory usage by up to 19.97%.

pdf bib abs

Emotional support conversation (ESC) aims to alleviate users’ psychological stress. Selecting the appropriate strategy is crucial for effective emotional support. Current strategy planner-based methods prioritize immediate responses while neglecting users’ future reactions. Some studies retrieve historical examples with similar emotions to the current utterance, then anticipating future emotions based on next-turn emotions of historical examples. However, their retrievals focus on the current emotion (i.e. a single-turn emotion state), while they ignore the evolution of user’s emotion before the current state. We argue that retrievals considering the whole emotional trajectories enables models to capture the dynamic emotional needs, thereby enhancing the anticipation of future emotions. To this end, we propose Markov-driven emotion anticipation framework with emotion trajectory-aware retrieval for LLM-based ESC, which anticipates future emotion states to guide strategy planning and achieve sustained emotional support. First, we construct a dynamic emotion memory and perform hierarchical retrieval that combines semantic matching and emotion trajectory alignment. Then, we model emotional transitions as Markov chains, leveraging trajectory-aware retrieval to estimate future emotion. Finally, we use the anticipated emotion to steer LLMs in generating candidate strategies and introduce active online learning to optimize the planner, boosting its robustness on diverse users. Experiments on two datasets with two models shows that our method excels all baselines.

2024

pdf bib abs

Temporal knowledge graph question answering (TKGQA) poses a significant challenge task, due to the temporal constraints hidden in questions and the answers sought from dynamic structured knowledge. Although large language models (LLMs) have made considerable progress in their reasoning ability over structured data, their application to the TKGQA task is a relatively unexplored area. This paper first proposes a novel generative temporal knowledge graph question answering framework, GenTKGQA, which guides LLMs to answer temporal questions through two phases: Subgraph Retrieval and Answer Generation. First, we exploit LLM’s intrinsic knowledge to mine temporal constraints and structural links in the questions without extra training, thus narrowing down the subgraph search space in both temporal and structural dimensions. Next, we design virtual knowledge indicators to fuse the graph neural network signals of the subgraph and the text representations of the LLM in a non-shallow way, which helps the open-source LLM deeply understand the temporal order and structural dependencies among the retrieved facts through instruction tuning. Experimental results on two widely used datasets demonstrate the superiority of our model.

2023

pdf bib abs

Temporal knowledge graph completion that predicts missing links for incomplete temporal knowledge graphs (TKG) is gaining increasing attention. Most existing works have achieved good results by incorporating time information into static knowledge graph embedding methods. However, they ignore the contextual nature of the TKG structure, i.e., query-specific subgraph contains both structural and temporal neighboring facts. This paper presents the SToKE, a novel method that employs the pre-trained language model (PLM) to learn joint Structural and Temporal Contextualized Knowledge Embeddings.Specifically, we first construct an event evolution tree (EET) for each query to enable PLMs to handle the TKG, which can be seen as a structured event sequence recording query-relevant structural and temporal contexts. We then propose a novel temporal embedding and structural matrix to learn the time information and structural dependencies of facts in EET.Finally, we formulate TKG completion as a mask prediction problem by masking the missing entity of the query to fine-tune pre-trained language models. Experimental results on three widely used datasets show the superiority of our model.

Co-authors

Yu Tang 1

Venues

Findings4

Fix author