Jens Allwood


Bring vs. MTRoget: Evaluating automatic thesaurus translation
Lars Borin | Jens Allwood | Gerard de Melo
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Evaluation of automatic language-independent methods for language technology resource creation is difficult, and confounded by a largely unknown quantity, viz. to what extent typological differences among languages are significant for results achieved for one language or language pair to be applicable across languages generally. In the work presented here, as a simplifying assumption, language-independence is taken as axiomatic within certain specified bounds. We evaluate the automatic translation of Roget’s “Thesaurus” from English into Swedish using an independently compiled Roget-style Swedish thesaurus, S.C. Bring’s “Swedish vocabulary arranged into conceptual classes” (1930). Our expectation is that this explicit evaluation of one of the thesaureses created in the MTRoget project will provide a good estimate of the quality of the other thesauruses created using similar methods.


Feedback in Nordic First-Encounters: a Comparative Study
Costanza Navarretta | Elisabeth Ahlsén | Jens Allwood | Kristiina Jokinen | Patrizia Paggio
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The paper compares how feedback is expressed via speech and head movements in comparable corpora of first encounters in three Nordic languages: Danish, Finnish and Swedish. The three corpora have been collected following common guidelines, and they have been annotated according to the same scheme in the NOMCO project. The results of the comparison show that in this data the most frequent feedback-related head movement is Nod in all three languages. Two types of Nods were distinguished in all corpora: Down-nods and Up-nods; the participants from the three countries use Down- and Up-nods with different frequency. In particular, Danes use Down-nods more frequently than Finns and Swedes, while Swedes use Up-nods more frequently than Finns and Danes. Finally, Finns use more often single Nods than repeated Nods, differing from the Swedish and Danish participants. The differences in the frequency of both Down-nods and Up-Nods in the Danish, Finnish and Swedish interactions are interesting given that Nordic countries are not only geographically near, but are also considered to be very similar culturally. Finally, a comparison of feedback-related words in the Danish and Swedish corpora shows that Swedes and Danes use common feedback words corresponding to yes and no with similar frequency.


Creating Comparable Multimodal Corpora for Nordic Languages
Costanza Navarretta | Elisabeth Ahlsén | Jens Allwood | Kristiina Jokinen | Patrizia Paggio
Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011)


The NOMCO Multimodal Nordic Resource - Goals and Characteristics
Patrizia Paggio | Jens Allwood | Elisabeth Ahlsén | Kristiina Jokinen | Costanza Navarretta
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper presents the multimodal corpora that are being collected and annotated in the Nordic NOMCO project. The corpora will be used to study communicative phenomena such as feedback, turn management and sequencing. They already include video material for Swedish, Danish, Finnish and Estonian, and several social activities are represented. The data will make it possible to verify empirically how gestures (head movements, facial displays, hand gestures and body postures) and speech interact in all the three mentioned aspects of communication. The data are being annotated following the MUMIN annotation scheme, which provides attributes concerning the shape and the communicative functions of head movements, face expressions, body posture and hand gestures. After having described the corpora, the paper discusses how they will be used to study the way feedback is expressed in speech and gestures, and reports results from two pilot studies where we investigated the function of head gestures ― both single and repeated ― in combination with feedback expressions. The annotated corpora will be valuable sources for research on intercultural communication as well as for interaction in the individual languages.

Work on Spoken (Multimodal) Language Corpora in South Africa
Jens Allwood | Harald Hammarström | Andries Hendrikse | Mtholeni N. Ngcobo | Nozibele Nomdebevana | Laurette Pretorius | Mac van der Merwe
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper describes past, ongoing and planned work on the collection and transcription of spoken language samples for all the South African official languages and as part of this the training of researchers in corpus linguistic research skills. More specifically the work has involved (and still involves) establishing an international corpus linguistic network linked to a network hub at a UNISA website and the development of research tools, a corpus research guide and workbook for multimodal communication and spoken language corpus research. As an example of the work we are doing and hope to do more of in the future, we present a small pilot study of the influence of English and Afrikaans on the 100 most frequent words in spoken Xhosa as this is evidenced in the corpus of spoken interaction we have gathered so far. Other planned work, besides work on spoken language phenomena, involves comparison of spoken and written language and work on communicative body movements (gestures) and their relation to speech.


pdf bib
Annotations and Tools for an Activity Based Spoken Language Corpus
Jens Allwood | Leif Groenqvist | Elisabeth Ahlsen | Magnus Gunnarsson
Proceedings of the Second SIGdial Workshop on Discourse and Dialogue


pdf bib
Towards Multimodal Spoken Language Corpora: TransTool and SyncTool
Joakim Nivre | Jens Allwood | Jenny Holm | Dario Lopez-Kasten
Partially Automated Techniques for Transcribing Naturally Occurring Continuous Speech