@inproceedings{shavarani-sarkar-2021-better,
    title = "Better Neural Machine Translation by Extracting Linguistic Information from {BERT}",
    author = "Shavarani, Hassan S.  and
      Sarkar, Anoop",
    editor = "Merlo, Paola  and
      Tiedemann, Jorg  and
      Tsarfaty, Reut",
    booktitle = "Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume",
    month = apr,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://preview.aclanthology.org/ingest-emnlp/2021.eacl-main.241/",
    doi = "10.18653/v1/2021.eacl-main.241",
    pages = "2772--2783",
    abstract = "Adding linguistic information (syntax or semantics) to neural machine translation (NMT) have mostly focused on using point estimates from pre-trained models. Directly using the capacity of massive pre-trained contextual word embedding models such as BERT(Devlin et al., 2019) has been marginally useful in NMT because effective fine-tuning is difficult to obtain for NMT without making training brittle and unreliable. We augment NMT by extracting dense fine-tuned vector-based linguistic information from BERT instead of using point estimates. Experimental results show that our method of incorporating linguistic information helps NMT to generalize better in a variety of training contexts and is no more difficult to train than conventional Transformer-based NMT."
}Markdown (Informal)
[Better Neural Machine Translation by Extracting Linguistic Information from BERT](https://preview.aclanthology.org/ingest-emnlp/2021.eacl-main.241/) (Shavarani & Sarkar, EACL 2021)
ACL