Evaluating Human-Machine Conversation for Appropriateness

Nick Webb; David Benyon; Preben Hansen; Oil Mival

Evaluating Human-Machine Conversation for Appropriateness

Nick Webb, David Benyon, Preben Hansen, Oil Mival

Abstract

Evaluation of complex, collaborative dialogue systems is a difficult task. Traditionally, developers have relied upon subjective feedback from the user, and parametrisation over observable metrics. However, both models place some reliance on the notion of a task; that is, the system is helping to user achieve some clearly defined goal, such as book a flight or complete a banking transaction. It is not clear that such metrics are as useful when dealing with a system that has a more complex task, or even no definable task at all, beyond maintain and performing a collaborative dialogue. Working within the EU funded COMPANIONS program, we investigate the use of appropriateness as a measure of conversation quality, the hypothesis being that good companions need to be good conversational partners . We report initial work in the direction of annotating dialogue for indicators of good conversation, including the annotation and comparison of the output of two generations of the same dialogue system.

Anthology ID:: L10-1071
Volume:: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:: May
Year:: 2010
Address:: Valletta, Malta
Editors:: Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias
Venue:: LREC
SIG:
Publisher:: European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:: http://www.lrec-conf.org/proceedings/lrec2010/pdf/115_Paper.pdf
DOI:
Bibkey:
Cite (ACL):: Nick Webb, David Benyon, Preben Hansen, and Oil Mival. 2010. Evaluating Human-Machine Conversation for Appropriateness. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
Cite (Informal):: Evaluating Human-Machine Conversation for Appropriateness (Webb et al., LREC 2010)
Copy Citation:
PDF:: http://www.lrec-conf.org/proceedings/lrec2010/pdf/115_Paper.pdf

PDF Search