Junyang Zhang


2026

Mixture-of-Experts (MoE) models offer a promising path for scaling model capacity, yet their massive memory footprint poses significant challenges for deployment on resource-constrained edge devices. Existing solutions, such as static pruning or dynamic offloading, often struggle to balance model accuracy with inference latency due to irreversible information loss or prohibitive I/O overhead. In this paper, we propose LightMoE, a novel framework for memory-efficient MoE inference that exploits the inherent functional redundancy and temporal locality of expert activation. LightMoE employs a frequency-aware expert initialization strategy to retain a compact core of resident experts and introduces a similarity-based redirection mechanism to compensate for missing experts without incurring I/O costs. Furthermore, it incorporates a lightweight runtime manager that performs coarse-grained, task-level expert replacement to adapt to shifting data distributions. Empirical evaluations on representative edge platforms demonstrate that LightMoE achieves a superior accuracy-efficiency trade-off, improving average accuracy by 4.3% over static pruning and 2.4% over dynamic swapping methods, while maintaining inference latency comparable to strictly pruned models.

2022

We study the problem of extracting N-ary relation tuples from scientific articles. This task is challenging because the target knowledge tuples can reside in multiple parts and modalities of the document. Our proposed method ReSel decomposes this task into a two-stage procedure that first retrieves the most relevant paragraph/table and then selects the target entity from the retrieved component. For the high-level retrieval stage, ReSel designs a simple and effective feature set, which captures multi-level lexical and semantic similarities between the query and components. For the low-level selection stage, ReSel designs a cross-modal entity correlation graph along with a multi-view architecture, which models both semantic and document-structural relations between entities. Our experiments on three scientific information extraction datasets show that ReSel outperforms state-of-the-art baselines significantly.