Patrizia Paggio

2024

pdf abs
Multimodal Behaviour in an Online Environment: The GEHM Zoom Corpus Collection
Patrizia Paggio | Manex Agirrezabal | Costanza Navarretta | Leo Vitasovic
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This paper introduces a novel multimodal corpus consisting of 12 video recordings of Zoom meetings held in English by an international group of researchers from September 2021 to March 2023. The meetings have an average duration of about 40 minutes each, for a total of 8 hours. The number of participants varies from 5 to 9 per meeting. The participants’ speech was transcribed automatically using WhisperX, while visual coordinates of several keypoints of the participants’ head, their shoulders and wrists, were extracted using OpenPose. The audio-visual recordings will be distributed together with the orthographic transcription as well as the visual coordinates. In the paper we describe the way the corpus was collected, transcribed and enriched with the visual coordinates, we give descriptive statistics concerning both the speech transcription and the visual keypoint values and we present and discuss visualisations of these values. Finally, we carry out a short preliminary analysis of the role of feedback in the meetings, and show how visualising the coordinates extracted via OpenPose can be used to see how gestural behaviour supports the use of feedback words during the interaction.

2022

pdf abs
Letters From the Past: Modeling Historical Sound Change Through Diachronic Character Embeddings
Sidsel Boldsen | Patrizia Paggio
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

While a great deal of work has been done on NLP approaches to lexical semantic change detection, other aspects of language change have received less attention from the NLP community. In this paper, we address the detection of sound change through historical spelling. We propose that a sound change can be captured by comparing the relative distance through time between the distributions of the characters involved before and after the change has taken place. We model these distributions using PPMI character embeddings. We verify this hypothesis in synthetic data and then test the method’s ability to trace the well-known historical change of lenition of plosives in Danish historical sources. We show that the models are able to identify several of the changes under consideration and to uncover meaningful contexts in which they appeared. The methodology has the potential to contribute to the study of open questions such as the relative chronology of sound shifts and their geographical distribution.

pdf bib
Proceedings of the 2nd Workshop on People in Vision, Language, and the Mind
Patrizia Paggio | Albert Gatt | Marc Tanti
Proceedings of the 2nd Workshop on People in Vision, Language, and the Mind

2021

pdf abs
Towards a Methodology Supporting Semiautomatic Annotation of HeadMovements in Video-recorded Conversations
Patrizia Paggio | Costanza Navarretta | Bart Jongejan | Manex Agirrezabal
Proceedings of the Joint 15th Linguistic Annotation Workshop (LAW) and 3rd Designing Meaning Representations (DMR) Workshop

We present a method to support the annotation of head movements in video-recorded conversations. Head movement segments from annotated multimodal data are used to train a model to detect head movements in unseen data. The resulting predicted movement sequences are uploaded to the ANVIL tool for post-annotation editing. The automatically identified head movements and the original annotations are compared to assess the overlap between the two. This analysis showed that movement onsets were more easily detected than offsets, and pointed at a number of patterns in the mismatches between original annotations and model predictions that could be dealt with in general terms in post-annotation guidelines.

2020

pdf bib
Proceedings of LREC2020 Workshop "People in language, vision and the mind" (ONION2020)
Patrizia Paggio | Albert Gatt | Roman Klinger
Proceedings of LREC2020 Workshop "People in language, vision and the mind" (ONION2020)

pdf abs
Automatic Detection and Classification of Head Movements in Face-to-Face Conversations
Patrizia Paggio | Manex Agirrezabal | Bart Jongejan | Costanza Navarretta
Proceedings of LREC2020 Workshop "People in language, vision and the mind" (ONION2020)

This paper presents an approach to automatic head movement detection and classification in data from a corpus of video-recorded face-to-face conversations in Danish involving 12 different speakers. A number of classifiers were trained with different combinations of visual, acoustic and word features and tested in a leave-one-out cross validation scenario. The visual movement features were extracted from the raw video data using OpenPose, and the acoustic ones using Praat. The best results were obtained by a Multilayer Perceptron classifier, which reached an average 0.68 F1 score across the 12 speakers for head movement detection, and 0.40 for head movement classification given four different classes. In both cases, the classifier outperformed a simple most frequent class baseline as well as a more advanced baseline only relying on velocity features.

pdf abs
Dialogue Act Annotation in a Multimodal Corpus of First Encounter Dialogues
Costanza Navarretta | Patrizia Paggio
Proceedings of the Twelfth Language Resources and Evaluation Conference

This paper deals with the annotation of dialogue acts in a multimodal corpus of first encounter dialogues, i.e. face-to- face dialogues in which two people who meet for the first time talk with no particular purpose other than just talking. More specifically, we describe the method used to annotate dialogue acts in the corpus, including the evaluation of the annotations. Then, we present descriptive statistics of the annotation, particularly focusing on which dialogue acts often follow each other across speakers and which dialogue acts overlap with gestural behaviour. Finally, we discuss how feedback is expressed in the corpus by means of feedback dialogue acts with or without co-occurring gestural behaviour, i.e. multimodal vs. unimodal feedback.

2019

pdf abs
Identifying Temporal Trends Based on Perplexity and Clustering: Are We Looking at Language Change?
Sidsel Boldsen | Manex Agirrezabal | Patrizia Paggio
Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change

In this work we propose a data-driven methodology for identifying temporal trends in a corpus of medieval charters. We have used perplexities derived from RNNs as a distance measure between documents and then, performed clustering on those distances. We argue that perplexities calculated by such language models are representative of temporal trends. The clusters produced using the K-Means algorithm give an insight of the differences in language in different time periods at least partly due to language change. We suggest that the temporal distribution of the individual clusters might provide a more nuanced picture of temporal trends compared to discrete bins, thus providing better results when used in a classification task.

2018

pdf
Classifying the Informative Behaviour of Emoji in Microblogs
Giulia Donato | Patrizia Paggio
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf abs
Automatic identification of head movements in video-recorded conversations: can words help?
Patrizia Paggio | Costanza Navarretta | Bart Jongejan
Proceedings of the Sixth Workshop on Vision and Language

We present an approach where an SVM classifier learns to classify head movements based on measurements of velocity, acceleration, and the third derivative of position with respect to time, jerk. Consequently, annotations of head movements are added to new video data. The results of the automatic annotation are evaluated against manual annotations in the same data and show an accuracy of 68% with respect to these. The results also show that using jerk improves accuracy. We then conduct an investigation of the overlap between temporal sequences classified as either movement or non-movement and the speech stream of the person performing the gesture. The statistics derived from this analysis show that using word features may help increase the accuracy of the model.

pdf abs
Investigating Redundancy in Emoji Use: Study on a Twitter Based Corpus
Giulia Donato | Patrizia Paggio
Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

In this paper we present an annotated corpus created with the aim of analyzing the informative behaviour of emoji – an issue of importance for sentiment analysis and natural language processing. The corpus consists of 2475 tweets all containing at least one emoji, which has been annotated using one of the three possible classes: Redundant, Non Redundant, and Non Redundant + POS. We explain how the corpus was collected, describe the annotation procedure and the interface developed for the task. We provide an analysis of the corpus, considering also possible predictive features, discuss the problematic aspects of the annotation, and suggest future improvements.

2016

pdf bib abs
The Effect of Gender and Age Differences on the Recognition of Emotions from Facial Expressions
Daniela Schneevogt | Patrizia Paggio
Proceedings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media (PEOPLES)

Recent studies have demonstrated gender and cultural differences in the recognition of emotions in facial expressions. However, most studies were conducted on American subjects. In this paper, we explore the generalizability of several findings to a non-American culture in the form of Danish subjects. We conduct an emotion recognition task followed by two stereotype questionnaires with different genders and age groups. While recent findings (Krems et al., 2015) suggest that women are biased to see anger in neutral facial expressions posed by females, in our sample both genders assign higher ratings of anger to all emotions expressed by females. Furthermore, we demonstrate an effect of gender on the fear-surprise-confusion observed by Tomkins and McCarter (1964); females overpredict fear, while males overpredict surprise.

In this article, we compare feedback-related multimodal behaviours in two different types of interactions: first encounters between two participants who do not know each other in advance, and naturally-occurring conversations between two and three participants recorded at their homes. All participants are Danish native speakers. The interactions are transcribed using the same methodology, and the multimodal behaviours are annotated according to the same annotation scheme. In the study we focus on the most frequently occurring feedback expressions in the interactions and on feedback-related head movements and facial expressions. The analysis of the corpora, while confirming general facts about feedback-related head movements and facial expressions previously reported in the literature, also shows that the physical setting, the number of participants, the topics discussed, and the degree of familiarity influence the use of gesture types and the frequency of feedback-related expressions and gestures.

pdf abs
Feedback in Nordic First-Encounters: a Comparative Study
Costanza Navarretta | Elisabeth Ahlsén | Jens Allwood | Kristiina Jokinen | Patrizia Paggio
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The paper compares how feedback is expressed via speech and head movements in comparable corpora of first encounters in three Nordic languages: Danish, Finnish and Swedish. The three corpora have been collected following common guidelines, and they have been annotated according to the same scheme in the NOMCO project. The results of the comparison show that in this data the most frequent feedback-related head movement is Nod in all three languages. Two types of Nods were distinguished in all corpora: Down-nods and Up-nods; the participants from the three countries use Down- and Up-nods with different frequency. In particular, Danes use Down-nods more frequently than Finns and Swedes, while Swedes use Up-nods more frequently than Finns and Danes. Finally, Finns use more often single Nods than repeated Nods, differing from the Swedish and Danish participants. The differences in the frequency of both Down-nods and Up-Nods in the Danish, Finnish and Swedish interactions are interesting given that Nordic countries are not only geographically near, but are also considered to be very similar culturally. Finally, a comparison of feedback-related words in the Danish and Swedish corpora shows that Swedes and Danes use common feedback words corresponding to yes and no with similar frequency.

2011

pdf
Creating Comparable Multimodal Corpora for Nordic Languages
Costanza Navarretta | Elisabeth Ahlsén | Jens Allwood | Kristiina Jokinen | Patrizia Paggio
Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011)

2010

pdf
Classification of Feedback Expressions in Multimodal Data
Costanza Navarretta | Patrizia Paggio
Proceedings of the ACL 2010 Conference Short Papers

pdf abs
The NOMCO Multimodal Nordic Resource - Goals and Characteristics
Patrizia Paggio | Jens Allwood | Elisabeth Ahlsén | Kristiina Jokinen | Costanza Navarretta
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper presents the multimodal corpora that are being collected and annotated in the Nordic NOMCO project. The corpora will be used to study communicative phenomena such as feedback, turn management and sequencing. They already include video material for Swedish, Danish, Finnish and Estonian, and several social activities are represented. The data will make it possible to verify empirically how gestures (head movements, facial displays, hand gestures and body postures) and speech interact in all the three mentioned aspects of communication. The data are being annotated following the MUMIN annotation scheme, which provides attributes concerning the shape and the communicative functions of head movements, face expressions, body posture and hand gestures. After having described the corpora, the paper discusses how they will be used to study the way feedback is expressed in speech and gestures, and reports results from two pilot studies where we investigated the function of head gestures ― both single and repeated ― in combination with feedback expressions. The annotated corpora will be valuable sources for research on intercultural communication as well as for interaction in the individual languages.

2006

pdf
Information Structure and Pauses in a Corpus of Spoken Danish
Patrizia Paggio
Demonstrations

pdf abs
Annotating Information Structure in a Corpus of Spoken Danish
Patrizia Paggio
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper presents the work done to annotate a corpus of spoken Danish with information structure tags, and describes a preliminary study in which the corpus has been used to investigate the relation between focus and intra-clausal pauses. The study indicates that the pauses that do fall within the focus domain tend to precede property-expressing words by which the object in focus is distinguished from other similar ones.