Bo He

2026

The massive size of Large Language Models (LLMs) imposes substantial computational and storage burdens, particularly on devices with limited hardware resources. Compared to foundation models, smaller and more specialized models are often more suitable for practical deployment. Existing customization approaches, such as the conventional “prune-then-finetune” paradigm or task-agnostic deployment strategies, either incur excessive computational costs or lead to suboptimal task performance. The recently popular Mixture-of-Experts (MoE) architecture exhibits a strong ability to mitigate inter-task interference, offering a new perspective on model deployment. In this paper, we introduce ModularMoE, a training framework that converts pre-trained LLMs into parameter-sharing MoE models for lightweight deployment. Exploiting the emergent modularity within LLMs, we split the feed-forward layers into multiple disjoint modules. Each expert is then constructed as a combination of such modules, enabling knowledge sharing across experts and thereby improving parameter efficiency within MoEs. Extensive experiments across multiple downstream tasks demonstrate that ModularMoE outperforms other state-of-the-art baselines at the same sparsity level, achieving an average performance improvement of 4.10% to 28.75% while delivering up to 2.71× inference speedup.

2025

pdf bib abs

Entity alignment (EA) is crucial for integrating multi-source knowledge graphs (KGs), aiming to identify equivalent entities across different graphs. However, most existing EA decoding methods rely on both entity and relation embeddings, limiting their generalizability and efficiency, especially in GNN-based models. To address these challenges, we propose Triple Feature Propagation (TFP), an adaptable and fast EA decoding framework that only utilizes entity embeddings. TFP reconstructs KG representation by maximizing the smoothness of entity embeddings. The discretized smoothness-maximization process yields the explicit Euler solution of TFP. We also generalize multi-view matrices: entity-to-entity, entity-to-relation, relation-to-entity, and relation-to-triple, to capture structural diversity. Extensive experiments on public datasets demonstrate that TFP is fast and adaptable to various encoders, achieving comparable results to state-of-the-art methods in under 6 seconds, and surpassing them in many cases.

Co-authors

Qi Qi 1

Venues

Findings2

Fix author