2024
pdf
abs
PartialFormer: Modeling Part Instead of Whole for Machine Translation
Tong Zheng
|
Bei Li
|
Huiwen Bao
|
Jiale Wang
|
Weiqiao Shan
|
Tong Xiao
|
JingBo Zhu
Findings of the Association for Computational Linguistics: ACL 2024
The design choices in Transformer feed-forward neural networks have resulted in significant computational and parameter overhead. In this work, we emphasize the importance of hidden dimensions in designing lightweight FFNs, a factor often overlooked in previous architectures. Guided by this principle, we introduce PartialFormer, a parameter-efficient Transformer architecture utilizing multiple smaller FFNs to reduce parameters and computation while maintaining essential hidden dimensions. These smaller FFNs are integrated into a multi-head attention mechanism for effective collaboration. We also propose a tailored head scaling strategy to enhance PartialFormer’s capabilities. Furthermore, we present a residual-like attention calculation to improve depth scaling within PartialFormer. Extensive experiments on 9 translation tasks and 1 abstractive summarization task validate the effectiveness of our PartialFormer approach on machine translation and summarization tasks. Our code would be available at: https://github.com/zhengkid/PartialFormer.
2022
pdf
abs
The NiuTrans Machine Translation Systems for WMT22
Weiqiao Shan
|
Zhiquan Cao
|
Yuchen Han
|
Siming Wu
|
Yimin Hu
|
Jie Wang
|
Yi Zhang
|
Hou Baoyu
|
Hang Cao
|
Chenghao Gao
|
Xiaowen Liu
|
Tong Xiao
|
Anxiang Ma
|
Jingbo Zhu
Proceedings of the Seventh Conference on Machine Translation (WMT)
This paper describes the NiuTrans neural machine translation systems of the WMT22 General MT constrained task. We participate in four directions, including Chinese→English, English→Croatian, and Livonian↔English. Our models are based on several advanced Transformer variants, e.g., Transformer-ODE, Universal Multiscale Transformer (UMST). The main workflow consists of data filtering, large-scale data augmentation (i.e., iterative back-translation, iterative knowledge distillation), and specific-domain fine-tuning. Moreover, we try several multi-domain methods, such as a multi-domain model structure and a multi-domain data clustering method, to rise to this year’s newly proposed multi-domain test set challenge. For low-resource scenarios, we build a multi-language translation model to enhance the performance, and try to use the pre-trained language model (mBERT) to initialize the translation model.
2020
pdf
abs
The NiuTrans Machine Translation Systems for WMT20
Yuhao Zhang
|
Ziyang Wang
|
Runzhe Cao
|
Binghao Wei
|
Weiqiao Shan
|
Shuhan Zhou
|
Abudurexiti Reheman
|
Tao Zhou
|
Xin Zeng
|
Laohu Wang
|
Yongyu Mu
|
Jingnan Zhang
|
Xiaoqian Liu
|
Xuanjun Zhou
|
Yinqiao Li
|
Bei Li
|
Tong Xiao
|
Jingbo Zhu
Proceedings of the Fifth Conference on Machine Translation
This paper describes NiuTrans neural machine translation systems of the WMT20 news translation tasks. We participated in Japanese<->English, English->Chinese, Inuktitut->English and Tamil->English total five tasks and rank first in Japanese<->English both sides. We mainly utilized iterative back-translation, different depth and widen model architectures, iterative knowledge distillation and iterative fine-tuning. And we find that adequately widened and deepened the model simultaneously, the performance will significantly improve. Also, iterative fine-tuning strategy we implemented is effective during adapting domain. For Inuktitut->English and Tamil->English tasks, we built multilingual models separately and employed pretraining word embedding to obtain better performance.