Augmenting Small Data to Classify Contextualized Dialogue Acts for Exploratory Visualization

Abhinav Kumar, Barbara Di Eugenio, Jillian Aurisano, Andrew Johnson


Abstract
Our goal is to develop an intelligent assistant to support users explore data via visualizations. We have collected a new corpus of conversations, CHICAGO-CRIME-VIS, geared towards supporting data visualization exploration, and we have annotated it for a variety of features, including contextualized dialogue acts. In this paper, we describe our strategies and their evaluation for dialogue act classification. We highlight how thinking aloud affects interpretation of dialogue acts in our setting and how to best capture that information. A key component of our strategy is data augmentation as applied to the training data, since our corpus is inherently small. We ran experiments with the Balanced Bagging Classifier (BAGC), Condiontal Random Field (CRF), and several Long Short Term Memory (LSTM) networks, and found that all of them improved compared to the baseline (e.g., without the data augmentation pipeline). CRF outperformed the other classification algorithms, with the LSTM networks showing modest improvement, even after obtaining a performance boost from domain-trained word embeddings. This result is of note because training a CRF is far less resource-intensive than training deep learning models, hence given a similar if not better performance, traditional methods may still be preferable in order to lower resource consumption.
Anthology ID:
2020.lrec-1.74
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
590–599
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.74
DOI:
Bibkey:
Cite (ACL):
Abhinav Kumar, Barbara Di Eugenio, Jillian Aurisano, and Andrew Johnson. 2020. Augmenting Small Data to Classify Contextualized Dialogue Acts for Exploratory Visualization. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 590–599, Marseille, France. European Language Resources Association.
Cite (Informal):
Augmenting Small Data to Classify Contextualized Dialogue Acts for Exploratory Visualization (Kumar et al., LREC 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/landing_page/2020.lrec-1.74.pdf