Beyond Adjacency Pairs: Hierarchical Clustering of Long Sequences for Human-Machine Dialogues

Maitreyee Maitreyee


Abstract
This work proposes a framework to predict sequences in dialogues, using turn based syntactic features and dialogue control functions. Syntactic features were extracted using dependency parsing, while dialogue control functions were manually labelled. These features were transformed using tf-idf and word embedding; feature selection was done using Principal Component Analysis (PCA). We ran experiments on six combinations of features to predict sequences with Hierarchical Agglomerative Clustering. An analysis of the clustering results indicate that using word embeddings and syntactic features, significantly improved the results.
Anthology ID:
2020.codi-1.2
Volume:
Proceedings of the First Workshop on Computational Approaches to Discourse
Month:
November
Year:
2020
Address:
Online
Venue:
CODI
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
11–19
Language:
URL:
https://aclanthology.org/2020.codi-1.2
DOI:
10.18653/v1/2020.codi-1.2
Bibkey:
Cite (ACL):
Maitreyee Maitreyee. 2020. Beyond Adjacency Pairs: Hierarchical Clustering of Long Sequences for Human-Machine Dialogues. In Proceedings of the First Workshop on Computational Approaches to Discourse, pages 11–19, Online. Association for Computational Linguistics.
Cite (Informal):
Beyond Adjacency Pairs: Hierarchical Clustering of Long Sequences for Human-Machine Dialogues (Maitreyee, CODI 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2020.codi-1.2.pdf