Javier Ferrando


2022

pdf
On the Locality of Attention in Direct Speech Translation
Belen Alastruey | Javier Ferrando | Gerard I. Gállego | Marta R. Costa-jussà
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Transformers have achieved state-of-the-art results across multiple NLP tasks. However, the self-attention mechanism complexity scales quadratically with the sequence length, creating an obstacle for tasks involving long sequences, like in the speech domain. In this paper, we discuss the usefulness of self-attention for Direct Speech Translation. First, we analyze the layer-wise token contributions in the self-attention of the encoder, unveiling local diagonal patterns. To prove that some attention weights are avoidable, we propose to substitute the standard self-attention with a local efficient one, setting the amount of context used based on the results of the analysis. With this approach, our model matches the baseline performance, and improves the efficiency by skipping the computation of those weights that standard attention discards.

2021

pdf
Attention Weights in Transformer NMT Fail Aligning Words Between Sequences but Largely Explain Model Predictions
Javier Ferrando | Marta R. Costa-jussà
Findings of the Association for Computational Linguistics: EMNLP 2021

This work proposes an extensive analysis of the Transformer architecture in the Neural Machine Translation (NMT) setting. Focusing on the encoder-decoder attention mechanism, we prove that attention weights systematically make alignment errors by relying mainly on uninformative tokens from the source sequence. However, we observe that NMT models assign attention to these tokens to regulate the contribution in the prediction of the two contexts, the source and the prefix of the target sequence. We provide evidence about the influence of wrong alignments on the model behavior, demonstrating that the encoder-decoder attention mechanism is well suited as an interpretability method for NMT. Finally, based on our analysis, we propose methods that largely reduce the word alignment error rate compared to standard induced alignments from attention weights.

pdf
The TALP-UPC Participation in WMT21 News Translation Task: an mBART-based NMT Approach
Carlos Escolano | Ioannis Tsiamas | Christine Basta | Javier Ferrando | Marta R. Costa-jussa | José A. R. Fonollosa
Proceedings of the Sixth Conference on Machine Translation

This paper describes the submission to the WMT 2021 news translation shared task by the UPC Machine Translation group. The goal of the task is to translate German to French (De-Fr) and French to German (Fr-De). Our submission focuses on fine-tuning a pre-trained model to take advantage of monolingual data. We fine-tune mBART50 using the filtered data, and additionally, we train a Transformer model on the same data from scratch. In the experiments, we show that fine-tuning mBART50 results in 31.69 BLEU for De-Fr and 23.63 BLEU for Fr-De, which increases 2.71 and 1.90 BLEU accordingly, as compared to the model we train from scratch. Our final submission is an ensemble of these two models, further increasing 0.3 BLEU for Fr-De.