Mei Tu

2022

pdf abs
Long-range Sequence Modeling with Predictable Sparse Attention
Yimeng Zhuang | Jing Zhang | Mei Tu
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Self-attention mechanism has been shown to be an effective approach for capturing global context dependencies in sequence modeling, but it suffers from quadratic complexity in time and memory usage. Due to the sparsity of the attention matrix, much computation is redundant. Therefore, in this paper, we design an efficient Transformer architecture, named Fourier Sparse Attention for Transformer (FSAT), for fast long-range sequence modeling. We provide a brand-new perspective for constructing sparse attention matrix, i.e. making the sparse attention matrix predictable. Two core sub-modules are: (1) A fast Fourier transform based hidden state cross module, which captures and pools L² semantic combinations in 𝒪(Llog L) time complexity. (2) A sparse attention matrix estimation module, which predicts dominant elements of an attention matrix based on the output of the previous hidden state cross module. By reparameterization and gradient truncation, FSAT successfully learned the index of dominant elements. The overall complexity about the sequence length is reduced from 𝒪(L²) to 𝒪(Llog L). Extensive experiments (natural language, vision, and math) show that FSAT remarkably outperforms the standard multi-head attention and its variants in various long-sequence tasks with low computational costs, and achieves new state-of-the-art results on the Long Range Arena benchmark.

2019

pdf abs
End-to-end Speech Translation System Description of LIT for IWSLT 2019
Mei Tu | Wei Liu | Lijie Wang | Xiao Chen | Xue Wen
Proceedings of the 16th International Conference on Spoken Language Translation

This paper describes our end-to-end speech translation system for the speech translation task of lectures and TED talks from English to German for IWSLT Evaluation 2019. We propose layer-tied self-attention for end-to-end speech translation. Our method takes advantage of sharing weights of speech encoder and text decoder. The representation of source speech and the representation of target text are coordinated layer by layer, so that the speech and text can learn a better alignment during the training procedure. We also adopt data augmentation to enhance the parallel speech-text corpus. The En-De experimental results show that our best model achieves 17.68 on tst2015. Our ASR achieves WER of 6.6% on TED-LIUM test set. The En-Pt model can achieve about 11.83 on the MuST-C dev set.

Although statistical machine translation (SMT) has made great progress since it came into being, the translation of numerical and time expressions is still far from satisfactory. Generally speaking, numbers are likely to be out-of-vocabulary (OOV) words due to their non-exhaustive characteristics even when the size of training data is very large, so it is difficult to obtain accurate translation results for the infinite set of numbers only depending on traditional statistical methods. We propose a language-independent framework to recognize and translate numbers more precisely by using a rule-based method. Through designing operators, we succeed to make rules educible and totally separate from codes, thus, we can extend rules to various language-pairs without re-coding, which contributes a lot to the efficient development of an SMT system with good portability. We classify numbers and time expressions into seven types, which are Arabic number, cardinal numbers, ordinal numbers, date, time of day, day of week and figures. A greedy algorithm is developed to deal with rule conflicts. Experiments have shown that our approach can significantly improve the translation performance.

Co-authors

Lijie Wang 1

Xiao Chen 1

Xue Wen 1

Venues

acl3
iwslt2

Mei Tu

2022

2019

2014

2013

2012

Co-authors

Venues