2012
pdf
abs
An audiovisual political speech analysis incorporating eye-tracking and perception data
Stefan Scherer
|
Georg Layher
|
John Kane
|
Heiko Neumann
|
Nick Campbell
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
We investigate the influence of audiovisual features on the perception of speaking style and performance of politicians, utilizing a large publicly available dataset of German parliament recordings. We conduct a human perception experiment involving eye-tracker data to evaluate human ratings as well as behavior in two separate conditions, i.e. audiovisual and video only. The ratings are evaluated on a five dimensional scale comprising measures of insecurity, monotony, expressiveness, persuasiveness, and overall performance. Further, they are statistically analyzed and put into context in a multimodal feature analysis, involving measures of prosody, voice quality and motion energy. The analysis reveals several statistically significant features, such as pause timing, voice quality measures and motion energy, that highly positively or negatively correlate with certain human ratings of speaking style. Additionally, we compare the gaze behavior of the human subjects to evaluate saliency regions in the multimodal and visual only conditions. The eye-tracking analysis reveals significant changes in the gaze behavior of the human subjects; participants reduce their focus of attention in the audiovisual condition mainly to the region of the face of the politician and scan the upper body, including hands and arms, in the video only condition.
2008
pdf
abs
The PIT Corpus of German Multi-Party Dialogues
Petra-Maria Strauß
|
Holger Hoffmann
|
Wolfgang Minker
|
Heiko Neumann
|
Günther Palm
|
Stefan Scherer
|
Harald Traue
|
Ulrich Weidenbacher
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
The PIT corpus is a German multi-media corpus of multi-party dialogues recorded in a Wizard-of-Oz environment at the University of Ulm. The scenario involves two human dialogue partners interacting with a multi-modal dialogue system in the domain of restaurant selection. In this paper we present the characteristics of the data which was recorded in three sessions resulting in a total of 75 dialogues and about 14 hours of audio and video data. The corpus is available at http://www.uni-ulm.de/in/pit.
2006
pdf
abs
Wizard-of-Oz Data Collection for Perception and Interaction in Multi-User Environments
Petra-Maria Strauß
|
Holger Hoffman
|
Wolfgang Minker
|
Heiko Neumann
|
Günther Palm
|
Stefan Scherer
|
Friedhelm Schwenker
|
Harald Traue
|
Welf Walter
|
Ulrich Weidenbacher
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
In this paper we present the setup of an extensive Wizard-of-Oz environment used for the data collection and the development of a dialogue system. The envisioned Perception and Interaction Assistant will act as an independent dialogue partner. Passively observing the dialogue between the two human users with respect to a limited domain, the system should take the initiative and get meaningfully involved in the communication process when required by the conversational situation. The data collection described here involves audio and video data. We aim at building a rich multi-media data corpus to be used as a basis for our research which includes, inter alia, speech and gaze direction recognition, dialogue modelling and proactivity of the system. We further aspire to obtain data with emotional content to perfom research on emotion recognition, psychopysiological and usability analysis.