Abstract
Multiword Expressions (MWEs) are a frequently occurring phenomenon found in all natural languages that is of great importance to linguistic theory, natural language processing applications, and machine translation systems. Neural Machine Translation (NMT) architectures do not handle these expressions well and previous studies have rarely addressed MWEs in this framework. In this work, we show that annotation and data augmentation, using external linguistic resources, can improve both translation of MWEs that occur in the source, and the generation of MWEs on the target, and increase performance by up to 5.09 BLEU points on MWE test sets. We also devise a MWE score to specifically assess the quality of MWE translation which agrees with human evaluation. We make available the MWE score implementation – along with MWE-annotated training sets and corpus-based lists of MWEs – for reproduction and extension.- Anthology ID:
- 2020.lrec-1.471
- Volume:
- Proceedings of the Twelfth Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2020
- Address:
- Marseille, France
- Editors:
- Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 3816–3825
- Language:
- English
- URL:
- https://aclanthology.org/2020.lrec-1.471
- DOI:
- Cite (ACL):
- Andrea Zaninello and Alexandra Birch. 2020. Multiword Expression aware Neural Machine Translation. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 3816–3825, Marseille, France. European Language Resources Association.
- Cite (Informal):
- Multiword Expression aware Neural Machine Translation (Zaninello & Birch, LREC 2020)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/2020.lrec-1.471.pdf