An audiovisual political speech analysis incorporating eye-tracking and perception data

Stefan Scherer, Georg Layher, John Kane, Heiko Neumann, Nick Campbell


Abstract
We investigate the influence of audiovisual features on the perception of speaking style and performance of politicians, utilizing a large publicly available dataset of German parliament recordings. We conduct a human perception experiment involving eye-tracker data to evaluate human ratings as well as behavior in two separate conditions, i.e. audiovisual and video only. The ratings are evaluated on a five dimensional scale comprising measures of insecurity, monotony, expressiveness, persuasiveness, and overall performance. Further, they are statistically analyzed and put into context in a multimodal feature analysis, involving measures of prosody, voice quality and motion energy. The analysis reveals several statistically significant features, such as pause timing, voice quality measures and motion energy, that highly positively or negatively correlate with certain human ratings of speaking style. Additionally, we compare the gaze behavior of the human subjects to evaluate saliency regions in the multimodal and visual only conditions. The eye-tracking analysis reveals significant changes in the gaze behavior of the human subjects; participants reduce their focus of attention in the audiovisual condition mainly to the region of the face of the politician and scan the upper body, including hands and arms, in the video only condition.
Anthology ID:
L12-1600
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1114–1120
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/1011_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Stefan Scherer, Georg Layher, John Kane, Heiko Neumann, and Nick Campbell. 2012. An audiovisual political speech analysis incorporating eye-tracking and perception data. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 1114–1120, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
An audiovisual political speech analysis incorporating eye-tracking and perception data (Scherer et al., LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/1011_Paper.pdf