Li He
Also published as: 丽 何, LI He
2026
From Pseudo-Balancing to True Specialization: Memory-Aware Routing for Mixture-of-Experts
Peixuan Hou | Yunbo Hou | Bin Chen | LI He | Jian Xu | Weiping Li | Bo Zheng | Guojie Song
Findings of the Association for Computational Linguistics: ACL 2026
Peixuan Hou | Yunbo Hou | Bin Chen | LI He | Jian Xu | Weiping Li | Bo Zheng | Guojie Song
Findings of the Association for Computational Linguistics: ACL 2026
Mixture-of-Experts (MoE) efficiently trains large models by using sparse activation to lower costs, selecting a few experts based on data characteristics. For MoE, an unbalanced expert load will lead to inefficient expert utilization and routing collapse. Existing methods commonly achieve an expert-centered balancing strategy to solve it, prioritizing equal utilization of experts over semantic alignment between tokens and experts. However, this can lead to a pseudo-balance phenomenon: To ensure expert load balancing, the same input is randomly routed to different experts across training steps instead of the most matching one. It introduces two critical issues: (1) Severe knowledge overlap among experts, resulting in redundant representations and inefficient parameter utilization. (2) Difficulty in forming and stabilizing expert specialization. These issues limit the scalability of models, especially large language models (LLM). To address these limitations, we introduce Memory-Aware Routing (MAR), a training-phase approach that enhances existing load-balancing strategies. By equipping each expert with a memory buffer, our method explicitly models their long-term preferences, allowing historical experience to guide routing. This ensures that tokens are routed more consistently to compatible experts, mitigating the pseudo-balance problem while maintaining global load balance and fostering expert specialization. Experimental results show that MAR improves expert specialization by 35% and downstream accuracy by 2%-25%, doubles parameter efficiency, and matches baseline performance with only half the experts.
2024
UFSC:基于统一特征空间构建的零样本关系抽取(UFSC: A Unified Feature Space Construction for Zero-Shot Relation Extraction)
Yuchen Liu (刘雨辰) | Jianyong Duan (段建勇) | Kang Sun (孙康) | Qing Zhang (张晴) | Li He (何丽) | Hao Wang (王昊) | Jie Liu (刘杰)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
Yuchen Liu (刘雨辰) | Jianyong Duan (段建勇) | Kang Sun (孙康) | Qing Zhang (张晴) | Li He (何丽) | Hao Wang (王昊) | Jie Liu (刘杰)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
“零样本关系抽取(ZSRE)旨在从可见关系中学习提取不可见关系的能力。一些研究表明:将样本语句与关系描述匹配进而预测不可见关系的方法,可以有效完成零样本关系抽取任务。然而,现有的匹配框架方法很少统一样本语句与关系描述的特征空间,缺乏对二者特征进行对齐。因此,本文提出一种为匹配框架零样本关系抽取而设计的统一特征空间构建方法。统一样本语句与关系描述的编码模块,并在此基础上引入特征相似损失。同时,为了减轻特征在空间上的聚合现象,引入特征均匀化模块,旨在构建特征更加均匀化的特征空间。本文所提出的方法实现了性能上的提升。与之前最佳的结果相比,在FewRel和Wiki-ZSL数据集上F1值平均提高1.6%和3.4%,体现了统一特征空间构建以及特征均匀化模块的有效性。”
2023
AMR-TST: Abstract Meaning Representation-based Text Style Transfer
Kaize Shi | Xueyao Sun | Li He | Dingxian Wang | Qing Li | Guandong Xu
Findings of the Association for Computational Linguistics: ACL 2023
Kaize Shi | Xueyao Sun | Li He | Dingxian Wang | Qing Li | Guandong Xu
Findings of the Association for Computational Linguistics: ACL 2023
Abstract Meaning Representation (AMR) is a semantic representation that can enhance natural language generation (NLG) by providing a logical semantic input. In this paper, we propose the AMR-TST, an AMR-based text style transfer (TST) technique. The AMR-TST converts the source text to an AMR graph and generates the transferred text based on the AMR graph modified by a TST policy named style rewriting. Our method combines both the explainability and diversity of explicit and implicit TST methods. The experiments show that the proposed method achieves state-of-the-art results compared with other baseline models in automatic and human evaluations. The generated transferred text in qualitative evaluation proves the AMR-TST have significant advantages in keeping semantic features and reducing hallucinations. To the best of our knowledge, this work is the first to apply the AMR method focusing on node-level features to the TST task.