Local Monotonic Attention Mechanism for End-to-End Speech And Language Processing

Andros Tjandra, Sakriani Sakti, Satoshi Nakamura


Abstract
Recently, encoder-decoder neural networks have shown impressive performance on many sequence-related tasks. The architecture commonly uses an attentional mechanism which allows the model to learn alignments between the source and the target sequence. Most attentional mechanisms used today is based on a global attention property which requires a computation of a weighted summarization of the whole input sequence generated by encoder states. However, it is computationally expensive and often produces misalignment on the longer input sequence. Furthermore, it does not fit with monotonous or left-to-right nature in several tasks, such as automatic speech recognition (ASR), grapheme-to-phoneme (G2P), etc. In this paper, we propose a novel attention mechanism that has local and monotonic properties. Various ways to control those properties are also explored. Experimental results on ASR, G2P and machine translation between two languages with similar sentence structures, demonstrate that the proposed encoder-decoder model with local monotonic attention could achieve significant performance improvements and reduce the computational complexity in comparison with the one that used the standard global attention architecture.
Anthology ID:
I17-1044
Volume:
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Month:
November
Year:
2017
Address:
Taipei, Taiwan
Editors:
Greg Kondrak, Taro Watanabe
Venue:
IJCNLP
SIG:
Publisher:
Asian Federation of Natural Language Processing
Note:
Pages:
431–440
Language:
URL:
https://aclanthology.org/I17-1044
DOI:
Bibkey:
Cite (ACL):
Andros Tjandra, Sakriani Sakti, and Satoshi Nakamura. 2017. Local Monotonic Attention Mechanism for End-to-End Speech And Language Processing. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 431–440, Taipei, Taiwan. Asian Federation of Natural Language Processing.
Cite (Informal):
Local Monotonic Attention Mechanism for End-to-End Speech And Language Processing (Tjandra et al., IJCNLP 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp22-frontmatter/I17-1044.pdf