Automatic Learning and Evaluation of User-Centered Objective Functions for Dialogue System Optimisation

Verena Rieser, Oliver Lemon


Abstract
The ultimate goal when building dialogue systems is to satisfy the needs of real users, but quality assurance for dialogue strategies is a non-trivial problem. The applied evaluation metrics and resulting design principles are often obscure, emerge by trial-and-error, and are highly context dependent. This paper introduces data-driven methods for obtaining reliable objective functions for system design. In particular, we test whether an objective function obtained from Wizard-of-Oz (WOZ) data is a valid estimate of real users’ preferences. We test this in a test-retest comparison between the model obtained from the WOZ study and the models obtained when testing with real users. We can show that, despite a low fit to the initial data, the objective function obtained from WOZ data makes accurate predictions for automatic dialogue evaluation, and, when automatically optimising a policy using these predictions, the improvement over a strategy simply mimicking the data becomes clear from an error analysis.
Anthology ID:
L08-1171
Volume:
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Month:
May
Year:
2008
Address:
Marrakech, Morocco
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/592_paper.pdf
DOI:
Bibkey:
Cite (ACL):
Verena Rieser and Oliver Lemon. 2008. Automatic Learning and Evaluation of User-Centered Objective Functions for Dialogue System Optimisation. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08), Marrakech, Morocco. European Language Resources Association (ELRA).
Cite (Informal):
Automatic Learning and Evaluation of User-Centered Objective Functions for Dialogue System Optimisation (Rieser & Lemon, LREC 2008)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/592_paper.pdf