Abstract
In spoken dialog systems (SDSs), dialog act (DA) segmentation and recognition provide essential information for response generation. A majority of previous works assumed ground-truth segmentation of DA units, which is not available from automatic speech recognition (ASR) in SDS. We propose a unified architecture based on neural networks, which consists of a sequence tagger for segmentation and a classifier for recognition. The DA recognition model is based on hierarchical neural networks to incorporate the context of preceding sentences. We investigate sharing some layers of the two components so that they can be trained jointly and learn generalized features from both tasks. An evaluation on the Switchboard Dialog Act (SwDA) corpus shows that the jointly-trained models outperform independently-trained models, single-step models, and other reported results in DA segmentation, recognition, and joint tasks.- Anthology ID:
- W18-5021
- Volume:
- Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue
- Month:
- July
- Year:
- 2018
- Address:
- Melbourne, Australia
- Editors:
- Kazunori Komatani, Diane Litman, Kai Yu, Alex Papangelis, Lawrence Cavedon, Mikio Nakano
- Venue:
- SIGDIAL
- SIG:
- SIGDIAL
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 201–208
- Language:
- URL:
- https://aclanthology.org/W18-5021
- DOI:
- 10.18653/v1/W18-5021
- Cite (ACL):
- Tianyu Zhao and Tatsuya Kawahara. 2018. A Unified Neural Architecture for Joint Dialog Act Segmentation and Recognition in Spoken Dialog System. In Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, pages 201–208, Melbourne, Australia. Association for Computational Linguistics.
- Cite (Informal):
- A Unified Neural Architecture for Joint Dialog Act Segmentation and Recognition in Spoken Dialog System (Zhao & Kawahara, SIGDIAL 2018)
- PDF:
- https://preview.aclanthology.org/naacl-24-ws-corrections/W18-5021.pdf