BERTology for Machine Translation: What BERT Knows about Linguistic Difficulties for Translation

Yuqian Dai; Marc de Kamps; Serge Sharoff

BERTology for Machine Translation: What BERT Knows about Linguistic Difficulties for Translation

Yuqian Dai, Marc de Kamps, Serge Sharoff

Abstract

Pre-trained transformer-based models, such as BERT, have shown excellent performance in most natural language processing benchmark tests, but we still lack a good understanding of the linguistic knowledge of BERT in Neural Machine Translation (NMT). Our work uses syntactic probes and Quality Estimation (QE) models to analyze the performance of BERT’s syntactic dependencies and their impact on machine translation quality, exploring what kind of syntactic dependencies are difficult for NMT engines based on BERT. While our probing experiments confirm that pre-trained BERT “knows” about syntactic dependencies, its ability to recognize them often decreases after fine-tuning for NMT tasks. We also detect a relationship between syntactic dependencies in three languages and the quality of their translations, which shows which specific syntactic dependencies are likely to be a significant cause of low-quality translations.

Anthology ID:: 2022.lrec-1.719
Volume:: Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:: June
Year:: 2022
Address:: Marseille, France
Editors:: Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association
Note:
Pages:: 6674–6690
Language:
URL:: https://preview.aclanthology.org/add-emnlp-2024-awards/2022.lrec-1.719/
DOI:
Bibkey:
Cite (ACL):: Yuqian Dai, Marc de Kamps, and Serge Sharoff. 2022. BERTology for Machine Translation: What BERT Knows about Linguistic Difficulties for Translation. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 6674–6690, Marseille, France. European Language Resources Association.
Cite (Informal):: BERTology for Machine Translation: What BERT Knows about Linguistic Difficulties for Translation (Dai et al., LREC 2022)
Copy Citation:
PDF:: https://preview.aclanthology.org/add-emnlp-2024-awards/2022.lrec-1.719.pdf

PDF Cite Search Fix data