This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
JacksonLiscombe
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
We present Witchcraft, an open-source framework for the evaluation of prediction models for spoken dialogue systems based on interaction logs and audio recordings. The use of Witchcraft is two fold: first, it provides an adaptable user interface to easily manage and browse thousands of logged dialogues (e.g. calls). Second, with help of the underlying models and the connected machine learning framework RapidMiner the workbench is able to display at each dialogue turn the probability of the task being completed based on the dialogue history. It estimates the emotional state, gender and age of the user. While browsing through a logged conversation, the user can directly observe the prediction result of the models at each dialogue step. By that, Witchcraft allows for spotting problematic dialogue situations and demonstrates where the current system and the prediction models have design flaws. Witchcraft will be made publically available to the community and will be deployed as open-source project.
This paper addresses the recognition of elderly callers based on short and narrow-band utterances, which are typical for Interactive Voice Response (IVR) systems. Our study is based on 2308 short utterances from a deployed IVR application. We show that features such as speaking rate, jitter and shimmer that are considered as most meaningful ones for determining elderly users underperform when used in the IVR context while pitch and intensity features seem to gain importance. We further demonstrate the influence of the utterance length on the classifiers performance: for both humans and classifier, the distinction between aged and non-aged voices becomes increasingly difficult the shorter the utterances get. Our setup based on a Support Vector Machine (SVM) with linear kernel reaches a comparably poor performance of 58% accuracy, which can be attributed to an average utterance length of only 1.6 seconds. The automatic distinction between aged and non-aged utterances drops to random when the utterance length falls below 1.2 seconds.