This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
PhilippeBretier
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
With a view to rationalise the evaluation process within the Orange Labs spoken dialogue system projects, a field audit has been realised among the various related professionals. The article presents the study's main conclusions and draws work perspectives to enhance the evaluation process in such a complex organisation. We first present the typical spoken dialogue system project lifecycle and the involved communities of stakeholders. We then sketch a map of indicators used across the teams. It shows that each professional category designs its evaluation metrics according to a case-by-case strategy, each one targeting different goals and methodologies. And last, we identify weaknesses in the evaluation process is handled by the various teams. Among others, we mention: the dependency on the design and exploitation tools that may not be suitable for an adequate collection of relevant indicators, the need to refine some indicators' definition and analysis to obtain valuable information for system enhancement, the sharing issue that advocates for a common definition of indicators across the teams and, as a consequence, the need for shared applications that support and encourage such a rationalisation.
L’évaluation des systèmes de dialogue homme-machine est un problème difficile et pour lequel ni les objectifs ni les solutions proposées ne font aujourd’hui l’unanimité. Les approches ergonomiques traditionnelles soumettent le système de dialogue au regard critique de l’utilisateur et tente d’en capter l’expression, mais l’absence d’un cadre objectivable des usages de ces utilisateurs empêche une comparaison entre systèmes différents, ou entre évolutions d’un même système. Nous proposons d’inverser cette vision et de mesurer le comportement de l’utilisateur au regard du système de dialogue. Aussi, au lieu d’évaluer l’adéquation du système à ses utilisateurs, nous mesurons l’adéquation des utilisateurs au système. Ce changement de paradigme permet un changement de référentiel qui n’est plus les usages des utilisateurs mais le cadre du système. Puisque le système est complètement défini, ce paradigme permet des approches quantitatives et donc des évaluations comparatives de systèmes.