Daize Dong
2025
Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts
Tong Zhu
|
Daize Dong
|
Xiaoye Qu
|
Jiacheng Ruan
|
Wenliang Chen
|
Yu Cheng
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
2024
LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-Training
Tong Zhu
|
Xiaoye Qu
|
Daize Dong
|
Jiacheng Ruan
|
Jingqi Tong
|
Conghui He
|
Yu Cheng
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
2023
PAD-Net: An Efficient Framework for Dynamic Networks
Shwai He
|
Liang Ding
|
Daize Dong
|
Boan Liu
|
Fuqiang Yu
|
Dacheng Tao
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
2022
SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency of Adapters
Shwai He
|
Liang Ding
|
Daize Dong
|
Jeremy Zhang
|
Dacheng Tao
Findings of the Association for Computational Linguistics: EMNLP 2022