Abstract
The neural hidden Markov model has been proposed as an alternative to attention mechanism in machine translation with recurrent neural networks. However, since the introduction of the transformer models, its performance has been surpassed. This work proposes to introduce the concept of the hidden Markov model to the transformer architecture, which outperforms the transformer baseline. Interestingly, we find that the zero-order model already provides promising performance, giving it an edge compared to a model with first-order dependency, which performs similarly but is significantly slower in training and decoding.- Anthology ID:
- 2021.acl-srw.3
- Volume:
- Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop
- Month:
- August
- Year:
- 2021
- Address:
- Online
- Venues:
- ACL | IJCNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 23–32
- Language:
- URL:
- https://aclanthology.org/2021.acl-srw.3
- DOI:
- 10.18653/v1/2021.acl-srw.3
- Cite (ACL):
- Weiyue Wang, Zijian Yang, Yingbo Gao, and Hermann Ney. 2021. Transformer-Based Direct Hidden Markov Model for Machine Translation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop, pages 23–32, Online. Association for Computational Linguistics.
- Cite (Informal):
- Transformer-Based Direct Hidden Markov Model for Machine Translation (Wang et al., ACL-IJCNLP 2021)
- PDF:
- https://preview.aclanthology.org/remove-xml-comments/2021.acl-srw.3.pdf