Abstract
Applying the Transformer architecture on the character level usually requires very deep architectures that are difficult and slow to train. These problems can be partially overcome by incorporating a segmentation into tokens in the model. We show that by initially training a subword model and then finetuning it on characters, we can obtain a neural machine translation model that works at the character level without requiring token segmentation. We use only the vanilla 6-layer Transformer Base architecture. Our character-level models better capture morphological phenomena and show more robustness to noise at the expense of somewhat worse overall translation quality. Our study is a significant step towards high-performance and easy to train character-based models that are not extremely large.- Anthology ID:
- 2020.emnlp-main.203
- Volume:
- Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
- Month:
- November
- Year:
- 2020
- Address:
- Online
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2572–2579
- Language:
- URL:
- https://aclanthology.org/2020.emnlp-main.203
- DOI:
- 10.18653/v1/2020.emnlp-main.203
- Cite (ACL):
- Jindřich Libovický and Alexander Fraser. 2020. Towards Reasonably-Sized Character-Level Transformer NMT by Finetuning Subword Systems. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2572–2579, Online. Association for Computational Linguistics.
- Cite (Informal):
- Towards Reasonably-Sized Character-Level Transformer NMT by Finetuning Subword Systems (Libovický & Fraser, EMNLP 2020)
- PDF:
- https://preview.aclanthology.org/paclic-22-ingestion/2020.emnlp-main.203.pdf
- Code
- jlibovicky/char-nmt + additional community code