AEA: Adaptive Expert Allocation Improves Sentence Embeddings from Mixture-of-Experts LLM

Shufan Yang; Zifeng Cheng; Zhiwei Jiang; Qingfeng Qi; Yafeng Yin; Cong Wang; Ao Zhou; Qing Gu

AEA: Adaptive Expert Allocation Improves Sentence Embeddings from Mixture-of-Experts LLM

Shufan Yang, Zifeng Cheng, Zhiwei Jiang, Qingfeng Qi, Yafeng Yin, Cong Wang, Ao Zhou, Qing Gu

Abstract

Extracting embeddings directly from Mixture-of-Experts (MoE) models is a promising yet underexplored direction that requires no additional data or fine-tuning. While previous studies have utilized semantic compression prompts or expert routing information to improve sentence embeddings, they typically allocate a fixed number of experts uniformly across all layers and tokens, ignoring inter-layer and inter-token heterogeneity. In this work, we identify two key observations in MoE models: (1) layer-wise variations in expert homogeneity, suggesting that different layers require different expert budgets, and (2) token-wise contribution imbalance, indicating that different tokens should also be allocated different numbers of experts. To address these issues, we propose an Adaptive Expert Allocation (AEA) framework that dynamically performs both layer-wise and token-wise expert allocation to enhance embedding quality. Specifically, AEA allocates fewer experts to layers with higher homogeneity and to tokens with lower attention importance, where layer-wise homogeneity is determined by the similarity among embeddings produced by the experts in each layer. Notably, our method is plug-and-play, seamlessly integrates with existing prompt engineering methods, and introduces no additional time overhead. Experiments on the STS tasks demonstrate that AEA consistently improves embedding quality across multiple MoE models.

Anthology ID:: 2026.acl-long.1331
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 28837–28851
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1331/
DOI:
Bibkey:
Cite (ACL):: Shufan Yang, Zifeng Cheng, Zhiwei Jiang, Qingfeng Qi, Yafeng Yin, Cong Wang, Ao Zhou, and Qing Gu. 2026. AEA: Adaptive Expert Allocation Improves Sentence Embeddings from Mixture-of-Experts LLM. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 28837–28851, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: AEA: Adaptive Expert Allocation Improves Sentence Embeddings from Mixture-of-Experts LLM (Yang et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.acl-long.1331.pdf
Checklist:: 2026.acl-long.1331.checklist.pdf

PDF Cite Search Checklist Fix data