Abstract
Building on recent advances in semantic parsing and text simplification, we investigate the use of semantic splitting of the source sentence as preprocessing for machine translation. We experiment with a Transformer model and evaluate using large-scale crowd-sourcing experiments. Results show a significant increase in fluency on long sentences on an English-to- French setting with a training corpus of 5M sentence pairs, while retaining comparable adequacy. We also perform a manual analysis which explores the tradeoff between adequacy and fluency in the case where all sentence lengths are considered.- Anthology ID:
- 2020.starsem-1.6
- Volume:
- Proceedings of the Ninth Joint Conference on Lexical and Computational Semantics
- Month:
- December
- Year:
- 2020
- Address:
- Barcelona, Spain (Online)
- Venue:
- *SEM
- SIG:
- SIGLEX
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 50–57
- Language:
- URL:
- https://aclanthology.org/2020.starsem-1.6
- DOI:
- Cite (ACL):
- Elior Sulem, Omri Abend, and Ari Rappoport. 2020. Semantic Structural Decomposition for Neural Machine Translation. In Proceedings of the Ninth Joint Conference on Lexical and Computational Semantics, pages 50–57, Barcelona, Spain (Online). Association for Computational Linguistics.
- Cite (Informal):
- Semantic Structural Decomposition for Neural Machine Translation (Sulem et al., *SEM 2020)
- PDF:
- https://preview.aclanthology.org/nodalida-main-page/2020.starsem-1.6.pdf
- Code
- eliorsulem/semantic-structural-decomposition-for-nmt
- Data
- WikiSplit