Abstract
In this paper we present a new method for intent recognition for complex dialog management in low resource situations. Complex dialog management is required because our target domain is real world mixed initiative food ordering between agents and their customers, where individual customer utterances may contain multiple intents and refer to food items with complex structure. For example, a customer might say “Can I get a deluxe burger with large fries and oh put extra mayo on the burger would you?” We approach this task as a multi-level sequence labeling problem, with the constraint of limited real training data. Both traditional methods like HMM, MEMM, or CRF and newer methods like DNN or BiLSTM use only homogeneous feature sets. Newer methods perform better but also require considerably more data. Previous research has done pseudo-data synthesis to obtain the required amounts of training data. We propose to use a k-NN learner with heterogeneous feature set. We used windowed word n-grams, POS tag n-grams and pre-trained word embeddings as features. For the experiments we perform a comparison between using pseudo-data and real world data. We also perform semi-supervised self-training to obtain additional labeled data, in order to better model real world scenarios. Instead of using massive pseudo-data, we show that with only less than 1% of the data size, we can achieve better result than any of the methods above by annotating real world data. We achieve labeled bracketed F-scores of 75.46, 52.84 and 49.66 for the three levels of sequence labeling where each level has a longer word span than its previous level. Overall we achieve 60.71F. In comparison, two previous systems, MEMM and DNN-ELMO, achieved 52.32 and 45.25 respectively.- Anthology ID:
- N19-2019
- Volume:
- Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers)
- Month:
- June
- Year:
- 2019
- Address:
- Minneapolis, Minnesota
- Editors:
- Anastassia Loukina, Michelle Morales, Rohit Kumar
- Venue:
- NAACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 149–156
- Language:
- URL:
- https://aclanthology.org/N19-2019
- DOI:
- 10.18653/v1/N19-2019
- Cite (ACL):
- Yue Chen and John Chen. 2019. A k-Nearest Neighbor Approach towards Multi-level Sequence Labeling. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers), pages 149–156, Minneapolis, Minnesota. Association for Computational Linguistics.
- Cite (Informal):
- A k-Nearest Neighbor Approach towards Multi-level Sequence Labeling (Chen & Chen, NAACL 2019)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/N19-2019.pdf