Wenqi Yang

2026

GASE: Graph-Aware Semantic Embedding Learning with Frozen LLMs for Text-Attributed Graphs
Mingqian Ding | Jianjun Li | Wenqi Yang | Zhibo Zhang | Yushen Fang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Large Language Models (LLMs) have shown strong potential for text-attributed graph (TAG) learning, yet effectively integrating LLM semantics with graph structural information remains challenging. Embeddings obtained from frozen LLMs lack topology awareness, while fine-tuning LLMs is often computationally expensive. Moreover, LLM embeddings are high-dimensional, and naively reducing dimensionality tends to destroy semantics. To address these issues, we propose GASE, a framework for learning Graph-Aware Semantic Embeddings using frozen LLMs. GASE consists of two key stages: First, we introduce a Training-Free Structure-Aware Semantic Extraction (TSSE) module. Through inter-layer semantic feedback and progressive masked attention, it efficiently compresses and propagates semantic context from neighboring nodes without updating LLM parameters. Second, we propose a Subspace Decomposition and Structural Injection (SDSI) strategy. Embeddings obtained from TSSE are decomposed into a semantic-rich subspace and a structural injection subspace, and structural signals are injected into the latter, which preserves original semantics while integrating graph information. Experiments demonstrate that GASE outperforms state-of-the-art baselines on node classification and achieves a 5× speedup over fine-tuning-based methods.

pdf bib abs

MARD: Module-Aware Reasoning Distillation for Language Models with Adaptive Supervision
Wenqi Yang | Jianjun Li | Zhibo Zhang | Mingqian Ding | Yushen Fang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Multi-step reasoning remains challenging for language models with limited capacity. While recent reasoning distillation approaches transfer chain-of-thought supervision from large teacher models, they typically apply uniform supervision across all Transformer components, overlooking the fact that different modules contribute unequally to reasoning. We propose Module-Aware Reasoning Distillation, a parameter-efficient framework that explicitly targets key Transformer components for effective reasoning transfer. Through systematic analysis, we identify the feed-forward network projections and the output projection of self-attention as primary bottlenecks for reasoning. Based on these findings, we introduce lightweight adapter modules at these components while freezing the backbone parameters, enabling focused and efficient distillation. Our approach adopts an offline distillation setting, where a strong teacher model provides reasoning trajectories in advance, and incorporates an adaptive supervision strategy that adjusts the strength of reasoning-related losses according to problem difficulty. Experiments on mathematical reasoning benchmarks demonstrate consistent improvements over strong baselines, and ablation studies confirm the importance of both module-aware placement and adaptive supervision.

Co-authors

Venues

ACL2

Fix author