Building Chatbots from Forum Data: Model Selection Using Question Answering Metrics

Martin Boyanov, Preslav Nakov, Alessandro Moschitti, Giovanni Da San Martino, Ivan Koychev

[How to correct problems with metadata yourself]


Abstract
We propose to use question answering (QA) data from Web forums to train chat-bots from scratch, i.e., without dialog data. First, we extract pairs of question and answer sentences from the typically much longer texts of questions and answers in a forum. We then use these shorter texts to train seq2seq models in a more efficient way. We further improve the parameter optimization using a new model selection strategy based on QA measures. Finally, we propose to use extrinsic evaluation with respect to a QA task as an automatic evaluation method for chatbot systems. The evaluation shows that the model achieves a MAP of 63.5% on the extrinsic task. Moreover, our manual evaluation demonstrates that the model can answer correctly 49.5% of the questions when they are similar in style to how questions are asked in the forum, and 47.3% of the questions, when they are more conversational in style.
Anthology ID:
R17-1018
Volume:
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017
Month:
September
Year:
2017
Address:
Varna, Bulgaria
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
121–129
Language:
URL:
https://doi.org/10.26615/978-954-452-049-6_018
DOI:
10.26615/978-954-452-049-6_018
Bibkey:
Cite (ACL):
Martin Boyanov, Preslav Nakov, Alessandro Moschitti, Giovanni Da San Martino, and Ivan Koychev. 2017. Building Chatbots from Forum Data: Model Selection Using Question Answering Metrics. In Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, pages 121–129, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):
Building Chatbots from Forum Data: Model Selection Using Question Answering Metrics (Boyanov et al., RANLP 2017)
Copy Citation:
PDF:
https://doi.org/10.26615/978-954-452-049-6_018