Learning Health-Bots from Training Data that was Automatically Created using Paraphrase Detection and Expert Knowledge

Anna Liednikova, Philippe Jolivet, Alexandre Durand-Salmon, Claire Gardent


Abstract
A key bottleneck for developing dialog models is the lack of adequate training data. Due to privacy issues, dialog data is even scarcer in the health domain. We propose a novel method for creating dialog corpora which we apply to create doctor-patient interaction data. We use this data to learn both a generation and a hybrid classification/retrieval model and find that the generation model consistently outperforms the hybrid model. We show that our data creation method has several advantages. Not only does it allow for the semi-automatic creation of large quantities of training data. It also provides a natural way of guiding learning and a novel method for assessing the quality of human-machine interactions.
Anthology ID:
2020.coling-main.55
Volume:
Proceedings of the 28th International Conference on Computational Linguistics
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Venue:
COLING
SIG:
Publisher:
International Committee on Computational Linguistics
Note:
Pages:
638–648
Language:
URL:
https://aclanthology.org/2020.coling-main.55
DOI:
10.18653/v1/2020.coling-main.55
Bibkey:
Cite (ACL):
Anna Liednikova, Philippe Jolivet, Alexandre Durand-Salmon, and Claire Gardent. 2020. Learning Health-Bots from Training Data that was Automatically Created using Paraphrase Detection and Expert Knowledge. In Proceedings of the 28th International Conference on Computational Linguistics, pages 638–648, Barcelona, Spain (Online). International Committee on Computational Linguistics.
Cite (Informal):
Learning Health-Bots from Training Data that was Automatically Created using Paraphrase Detection and Expert Knowledge (Liednikova et al., COLING 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/2020.coling-main.55.pdf