Satoshi Kobashikawa
2017
Improving Neural Text Normalization with Data Augmentation at Character- and Morphological Levels
Itsumi Saito
|
Jun Suzuki
|
Kyosuke Nishida
|
Kugatsu Sadamitsu
|
Satoshi Kobashikawa
|
Ryo Masumura
|
Yuji Matsumoto
|
Junji Tomita
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
In this study, we investigated the effectiveness of augmented data for encoder-decoder-based neural normalization models. Attention based encoder-decoder models are greatly effective in generating many natural languages. % such as machine translation or machine summarization. In general, we have to prepare for a large amount of training data to train an encoder-decoder model. Unlike machine translation, there are few training data for text-normalization tasks. In this paper, we propose two methods for generating augmented data. The experimental results with Japanese dialect normalization indicate that our methods are effective for an encoder-decoder model and achieve higher BLEU score than that of baselines. We also investigated the oracle performance and revealed that there is sufficient room for improving an encoder-decoder model.
Search
Co-authors
- Itsumi Saito 1
- Jun Suzuki 1
- Kyosuke Nishida 1
- Kugatsu Sadamitsu 1
- Ryo Masumura 1
- show all...