Zhen Xing
2023
TranSFormer: Slow-Fast Transformer for Machine Translation
Bei Li
|
Yi Jing
|
Xu Tan
|
Zhen Xing
|
Tong Xiao
|
Jingbo Zhu
Findings of the Association for Computational Linguistics: ACL 2023
Learning multiscale Transformer models has been evidenced as a viable approach to augmenting machine translation systems. Prior research has primarily focused on treating subwords as basic units in developing such systems. However, the incorporation of fine-grained character-level features into multiscale Transformer has not yet been explored. In this work, we present a Slow-Fast two-stream learning model, referred to as TranSFormer, which utilizes a “slow” branch to deal with subword sequences and a “fast” branch to deal with longer character sequences. This model is efficient since the fast branch is very lightweight by reducing the model width, and yet provides useful fine-grained features for the slow branch. Our TranSFormer shows consistent BLEU improvements (larger than 1 BLEU point) on several machine translation benchmarks.