Chang Jin
2022
Manifold’s English-Chinese System at WMT22 General MT Task
Chang Jin
|
Tingxun Shi
|
Zhengshan Xue
|
Xiaodong Lin
Proceedings of the Seventh Conference on Machine Translation (WMT)
Manifold’s English-Chinese System at WMT22 is an ensemble of 4 models trained by different configurations with scheduled sampling-based fine-tuning. The four configurations are DeepBig (XenC), DeepLarger (XenC), DeepBig-TalkingHeads (XenC) and DeepBig (LaBSE). Concretely, DeepBig extends Transformer-Big to 24 encoder layers. DeepLarger has 20 encoder layers and its feed-forward network (FFN) dimension is 8192. TalkingHeads applies the talking-heads trick. For XenC configs, we selected monolingual and parallel data that is similar to the past newstest datasets using XenC, and for LaBSE, we cleaned the officially provided parallel data using LaBSE pretrained model. According to the officially released autonomic metrics leaderboard, our final constrained system ranked 1st among all others when evaluated by bleu-all, chrf-all and COMET-B, 2nd by COMET-A.
2021
基于层间知识蒸馏的神经机器翻译(Inter-layer Knowledge Distillation for Neural Machine Translation)
Chang Jin (金畅)
|
Renchong Duan (段仁翀)
|
Nini Xiao (肖妮妮)
|
Xiangyu Duan (段湘煜)
Proceedings of the 20th Chinese National Conference on Computational Linguistics
神经机器翻译(NMT)通常采用多层神经网络模型结构,随着网络层数的加深,所得到的特征也越来越抽象,但是在现有的神经机器翻译模型中,高层的抽象信息仅在预测分布时被利用。为了更好地利用这些信息,本文提出了层间知识蒸馏,目的在于将高层网络的抽象知识迁移到低层网络,使低层网络能够捕捉更加有用的信息,从而提升整个模型的翻译质量。区别于传统教师模型和学生模型的知识蒸馏,层间知识蒸馏实现的是同一个模型内部不同层之间的知识迁移。通过在中文-英语、英语-罗马尼亚语、德语-英语三个数据集上的实验,结果证明层间蒸馏方法能够有效提升翻译性能,分别在中-英、英-罗、德-英上提升1.19,0.72,1.35的BLEU值,同时也证明有效地利用高层信息能够提高神经网络模型的翻译质量。
Search
Co-authors
- Renchong Duan (段仁翀) 1
- Nini Xiao 1
- Xiangyu Duan 1
- Tingxun Shi 1
- Zhengshan Xue 1
- show all...