Jorma Laaksonen

2018

This paper describes the MeMAD project entry to the WMT Multimodal Machine Translation Shared Task. We propose adapting the Transformer neural machine translation (NMT) architecture to a multi-modal setting. In this paper, we also describe the preliminary experiments with text-only translation systems leading us up to this choice. We have the top scoring system for both English-to-German and English-to-French, according to the automatic metrics for flickr18. Our experiments show that the effect of the visual features in our system is small. Our largest gains come from the quality of the underlying text-only NMT system. We find that appropriate use of additional data is effective.

2015

pdf bib
Towards Reliable Automatic Multimodal Content Analysis
Olli-Philippe Lautenbacher | Liisa Tiittula | Maija Hirvonen | Jorma Laaksonen | Mikko Kurimo
Proceedings of the Fourth Workshop on Vision and Language

2014

pdf bib abs
SLMotion - An extensible sign language oriented video analysis tool
Matti Karppa | Ville Viitaniemi | Marcos Luzardo | Jorma Laaksonen | Tommi Jantunen
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present a software toolkit called SLMotion which provides a framework for automatic and semiautomatic analysis, feature extraction and annotation of individual sign language videos, and which can easily be adapted to batch processing of entire sign language corpora. The program follows a modular design, and exposes a Numpy-compatible Python application programming interface that makes it easy and convenient to extend its functionality through scripting. The program includes support for exporting the annotations in ELAN format. The program is released as free software, and is available for GNU/Linux and MacOS platforms.

pdf bib abs
S-pot - a benchmark in spotting signs within continuous signing
Ville Viitaniemi | Tommi Jantunen | Leena Savolainen | Matti Karppa | Jorma Laaksonen
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper we present S-pot, a benchmark setting for evaluating the performance of automatic spotting of signs in continuous sign language videos. The benchmark includes 5539 video files of Finnish Sign Language, ground truth sign spotting results, a tool for assessing the spottings against the ground truth, and a repository for storing information on the results. In addition we will make our sign detection system and results made with it publicly available as a baseline for comparison and further developments.

2012

pdf bib abs
Comparing computer vision analysis of signed language video with motion capture recordings
Matti Karppa | Tommi Jantunen | Ville Viitaniemi | Jorma Laaksonen | Birgitta Burger | Danny De Weerdt
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

We consider a non-intrusive computer-vision method for measuring the motion of a person performing natural signing in video recordings. The quality and usefulness of the method is compared to a traditional marker-based motion capture set-up. The accuracy of descriptors extracted from video footage is assessed qualitatively in the context of sign language analysis by examining if the shape of the curves produced by the different means resemble one another in sequences where the shape could be a source of valuable linguistic information. Then, quantitative comparison is performed first by correlating the computer-vision-based descriptors with the variables gathered with the motion capture equipment. Finally, multivariate linear and non-linar regression methods are applied for predicting the motion capture variables based on combinations of computer vision descriptors. The results show that even the simple computer vision method evaluated in this paper can produce promisingly good results for assisting researchers working on sign language analysis.

Co-authors

Venues

LREC3
WS2