Improving Neural Text Normalization with Data Augmentation at Character- and Morphological Levels
Itsumi Saito, Jun Suzuki, Kyosuke Nishida, Kugatsu Sadamitsu, Satoshi Kobashikawa, Ryo Masumura, Yuji Matsumoto, Junji Tomita
Abstract
In this study, we investigated the effectiveness of augmented data for encoder-decoder-based neural normalization models. Attention based encoder-decoder models are greatly effective in generating many natural languages. % such as machine translation or machine summarization. In general, we have to prepare for a large amount of training data to train an encoder-decoder model. Unlike machine translation, there are few training data for text-normalization tasks. In this paper, we propose two methods for generating augmented data. The experimental results with Japanese dialect normalization indicate that our methods are effective for an encoder-decoder model and achieve higher BLEU score than that of baselines. We also investigated the oracle performance and revealed that there is sufficient room for improving an encoder-decoder model.- Anthology ID:
- I17-2044
- Volume:
- Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
- Month:
- November
- Year:
- 2017
- Address:
- Taipei, Taiwan
- Editors:
- Greg Kondrak, Taro Watanabe
- Venue:
- IJCNLP
- SIG:
- Publisher:
- Asian Federation of Natural Language Processing
- Note:
- Pages:
- 257–262
- Language:
- URL:
- https://aclanthology.org/I17-2044
- DOI:
- Cite (ACL):
- Itsumi Saito, Jun Suzuki, Kyosuke Nishida, Kugatsu Sadamitsu, Satoshi Kobashikawa, Ryo Masumura, Yuji Matsumoto, and Junji Tomita. 2017. Improving Neural Text Normalization with Data Augmentation at Character- and Morphological Levels. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 257–262, Taipei, Taiwan. Asian Federation of Natural Language Processing.
- Cite (Informal):
- Improving Neural Text Normalization with Data Augmentation at Character- and Morphological Levels (Saito et al., IJCNLP 2017)
- PDF:
- https://preview.aclanthology.org/teach-a-man-to-fish/I17-2044.pdf