Data Augmentation by Data Noising for Open-vocabulary Slots in Spoken Language Understanding

Hwa-Yeon Kim; Yoon-Hyung Roh; Young-Gil Kim

doi:10.18653/v1/N19-3014

Data Augmentation by Data Noising for Open-vocabulary Slots in Spoken Language Understanding

Hwa-Yeon Kim, Yoon-Hyung Roh, Young-Kil Kim

Abstract

One of the main challenges in Spoken Language Understanding (SLU) is dealing with ‘open-vocabulary’ slots. Recently, SLU models based on neural network were proposed, but it is still difficult to recognize the slots of unknown words or ‘open-vocabulary’ slots because of the high cost of creating a manually tagged SLU dataset. This paper proposes data noising, which reflects the characteristics of the ‘open-vocabulary’ slots, for data augmentation. We applied it to an attention based bi-directional recurrent neural network (Liu and Lane, 2016) and experimented with three datasets: Airline Travel Information System (ATIS), Snips, and MIT-Restaurant. We achieved performance improvements of up to 0.57% and 3.25 in intent prediction (accuracy) and slot filling (f1-score), respectively. Our method is advantageous because it does not require additional memory and it can be applied simultaneously with the training process of the model.

Anthology ID:: N19-3014
Volume:: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop
Month:: June
Year:: 2019
Address:: Minneapolis, Minnesota
Editors:: Sudipta Kar, Farah Nadeem, Laura Burdick, Greg Durrett, Na-Rae Han
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 97–102
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/N19-3014/
DOI:: 10.18653/v1/N19-3014
Bibkey:
Cite (ACL):: Hwa-Yeon Kim, Yoon-Hyung Roh, and Young-Kil Kim. 2019. Data Augmentation by Data Noising for Open-vocabulary Slots in Spoken Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop, pages 97–102, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):: Data Augmentation by Data Noising for Open-vocabulary Slots in Spoken Language Understanding (Kim et al., NAACL 2019)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/N19-3014.pdf

PDF Cite Search Fix data