This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
AnneliesBraffort
Also published as:
A. Braffort
Fixing paper assignments
Please select all papers that do not belong to this person.
Indicate below which author they should be assigned to.
This article presents an original method for Text-to-Sign Translation. It compensates data scarcity using a domain-specific parallel corpus of alignments between text and hierarchical formal descriptions of Sign Language videos. Based on the detection of similarities present in the source text, the proposed algorithm recursively exploits matches and substitutions of aligned segments to build multiple candidate translations for a novel statement. This helps preserving Sign Language structures as much as possible before falling back on literal translations too quickly, in a generative way. The resulting translations are in the form of AZee expressions, designed to be used as input to avatar synthesis systems. We present a test set tailored to showcase its potential for expressiveness and generation of idiomatic target language, and observed limitations. This work finally opens prospects on how to evaluate this kind of translation.
This paper is a brief summary of the First WMT Shared Task on Sign Language Translation (WMT-SLT22), a project partly funded by EAMT. The focus of this shared task is automatic translation between signed and spoken languages. Details can be found on our website (https://www.wmt-slt.com/) or in the findings paper (Müller et al., 2022).
Cet article présente une expérimentation de traduction automatique de texte vers la langue des signes (LS). Comme nous ne disposons pas de corpus aligné de grande taille, nous avons exploré une approche à base d’exemples, utilisant AZee, une représentation intermédiaire du discours en LS sous la forme d’expressions hiérarchisées
This paper presents the results of the Second WMT Shared Task on Sign Language Translation (WMT-SLT23; https://www.wmt-slt.com/). This shared task is concerned with automatic translation between signed and spoken languages. The task is unusual in the sense that it requires processing visual information (such as video frames or human pose estimation) beyond the well-known paradigm of text-to-text machine translation (MT). The task offers four tracks involving the following languages: Swiss German Sign Language (DSGS), French Sign Language of Switzerland (LSF-CH), Italian Sign Language of Switzerland (LIS-CH), German, French and Italian. Four teams (including one working on a baseline submission) participated in this second edition of the task, all submitting to the DSGS-to-German track. Besides a system ranking and system papers describing state-of-the-art techniques, this shared task makes the following scientific contributions: novel corpora and reproducible baseline systems. Finally, the task also resulted in publicly available sets of system outputs and more human evaluation scores for sign language translation.
This article presents a new French Sign Language (LSF) corpus called “Rosetta-LSF”. It was created to support future studies on the automatic translation of written French into LSF, rendered through the animation of a virtual signer. An overview of the field highlights the importance of a quality representation of LSF. In order to obtain quality animations understandable by signers, it must surpass the simple “gloss transcription” of the LSF lexical units to use in the discourse. To achieve this, we designed a corpus composed of four types of aligned data, and evaluated its usability. These are: news headlines in French, translations of these headlines into LSF in the form of videos showing animations of a virtual signer, gloss annotations of the “traditional” type—although including additional information on the context in which each gestural unit is performed as well as their potential for adaptation to another context—and AZee representations of the videos, i.e. formal expressions capturing the necessary and sufficient linguistic information. This article describes this data, exhibiting an example from the corpus. It is available online for public research.
Sign Language (SL) animations generated from motion capture (mocap) of real signers convey critical information about their identity. It has been suggested that this information is mostly carried by statistics of the movements kinematics. Manipulating these statistics in the generation of SL movements could allow controlling the identity of the signer, notably to preserve anonymity. This paper tests this hypothesis by presenting a novel synthesis algorithm that manipulates the identity-specific statistics of mocap recordings. The algorithm produced convincing new versions of French Sign Language discourses, which accurately modulated the identity prediction of a machine learning model. These results open up promising perspectives toward the automatic control of identity in the motion animation of virtual signers.
This article presents an original method for automatic generation of sign language (SL) content by means of the animation of an avatar, with the aim of creating animations that respect as much as possible linguistic constraints while keeping bio-realistic properties. This method is based on the use of a domain-specific bilingual corpus richly annotated with timed alignments between SL motion capture data, text and hierarchical expressions from the framework called AZee at subsentential level. Animations representing new SL content are built from blocks of animations present in the corpus and adapted to the context if necessary. A smart blending approach has been designed that allows the concatenation, replacement and adaptation of original animation blocks. This approach has been tested on a tailored testset to show as a proof of concept its potential in comprehensibility and fluidity of the animation, as well as its current limits.
This paper presents the results of the First WMT Shared Task on Sign Language Translation (WMT-SLT22).This shared task is concerned with automatic translation between signed and spoken languages. The task is novel in the sense that it requires processing visual information (such as video frames or human pose estimation) beyond the well-known paradigm of text-to-text machine translation (MT).The task featured two tracks, translating from Swiss German Sign Language (DSGS) to German and vice versa. Seven teams participated in this first edition of the task, all submitting to the DSGS-to-German track. Besides a system ranking and system papers describing state-of-the-art techniques, this shared task makes the following scientific contributions: novel corpora, reproducible baseline systems and new protocols and software for human evaluation. Finally, the task also resulted in the first publicly available set of system outputs and human evaluation scores for sign language translation.
Development of automatic translation between signed and spoken languages has lagged behind the development of automatic translation between spoken languages, but it is a common misperception that extending machine translation techniques to include signed languages should be a straightforward process. A contributing factor is the lack of an acceptable method for displaying sign language apart from interpreters on video. This position paper examines the challenges of displaying a signed language as a target in automatic translation, analyses the underlying causes and suggests strategies to develop display technologies that are acceptable to sign language communities.
While the research in automatic Sign Language Processing (SLP) is growing, it has been almost exclusively focused on recognizing lexical signs, whether isolated or within continuous SL production. However, Sign Languages include many other gestural units like iconic structures, which need to be recognized in order to go towards a true SL understanding. In this paper, we propose a newer version of the publicly available SL corpus Dicta-Sign, limited to its French Sign Language part. Involving 16 different signers, this dialogue corpus was produced with very few constraints on the style and content. It includes lexical and non-lexical annotations over 11 hours of video recording, with 35000 manual units. With the aim of stimulating research in SL understanding, we also provide a baseline for the recognition of lexical signs and non-lexical structures on this corpus. A very compact modeling of a signer is built and a Convolutional-Recurrent Neural Network is trained and tested on Dicta-Sign-LSF-v2, with state-of-the-art results, including the ability to detect iconicity in SL production.
This paper presents MEDIAPI-SKEL, a 2D-skeleton database of French Sign Language videos aligned with French subtitles. The corpus contains 27 hours of video of body, face and hand keypoints, aligned to subtitles with a vocabulary size of 17k tokens. In contrast to existing sign language corpora such as videos produced under laboratory conditions or translations of TV programs into sign language, this database is constructed using original sign language content largely produced by deaf journalists at the media company Média-Pi. Moreover, the videos are accurately synchronized with French subtitles. We propose three challenges appropriate for this corpus that are related to processing units of signs in context: automatic alignment of text and video, semantic segmentation of sign language, and production of video-text embeddings for cross-modal retrieval. These challenges deviate from the classic task of identifying a limited number of lexical signs in a video stream.
In a lot of recent research, attention has been drawn to recognizing sequences of lexical signs in continuous Sign Language corpora, often artificial. However, as SLs are structured through the use of space and iconicity, focusing on lexicon only prevents the field of Continuous Sign Language Recognition (CSLR) from extending to Sign Language Understanding and Translation. In this article, we propose a new formulation of the CSLR problem and discuss the possibility of recognizing higher-level linguistic structures in SL videos, like classifier constructions. These structures show much more variability than lexical signs, and are fundamentally different than them in the sense that form and meaning can not be disentangled. Building on the recently published French Sign Language corpus Dicta-Sign-LSF-v2, we discuss the performance and relevance of a simple recurrent neural network trained to recognize illustrative structures.
In this paper, we describe DEGELS1, a comparable corpus of French Sign Language and co-speech gestures that has been created to serve as a testbed corpus for the DEGELS workshops. These workshop series were initiated in France for researchers studying French Sign Language and co-speech gestures in French, with the aim of comparing methodologies for corpus annotation. An extract was used for the first event DEGELS2011 dedicated to the annotation of pointing, and the same extract will be used for DEGELS2012, dedicated to segmentation.
Cet article présente Dicta-Sign, un projet de recherche sur le traitement automatique des langues des signes (LS), qui aborde un grand nombre de questions de recherche : linguistique de corpus, modélisation linguistique, reconnaissance et génération automatique. L’objectif de ce projet est de réaliser trois applications prototypes destinées aux usagers sourds : un traducteur de termes de LS à LS, un outil de recherche par l’exemple et un Wiki en LS. Pour cela, quatre corpus comparables de cinq heures de dialogue seront produits et analysés. De plus, des avancées significatives sont attendues dans le domaine des outils d’annotation. Dans ce projet, le LIMSI est en charge de l’élaboration des modèles linguistiques et participe aux aspects corpus et génération automatique. Nous nous proposons d’illustrer l’état d’avancement de Dicta-Sign au travers de vidéos extraites du corpus et de démonstrations des outils de traitement et de génération d’animations de signeur virtuel.
Sign Languages (SLs) are the visuo-gestural languages practised by the deaf communities. Research on SLs requires to build, to analyse and to use corpora. The aim of this paper is to present various kinds of new uses of SL corpora. The way data are used take advantage of the new capabilities of annotation software for visualisation, numerical annotation, and processing. The nature of the data can be video-based or motion capture-based. The aims of the studies include language analysis, animation processing, and evaluation. We describe here some LIMSIs studies, and some studies from other laboratories as examples.
This paper deals with non manual gestures annotation involved in Sign Language within the context of automatic generation of Sign Language. We will tackle linguistic researches in sign language, present descriptions of non manual gestures and problems lead to movement description. Then, we will propose a new annotation methodology, which allows non manual gestures description. This methodology can describe all Non Manual Gestures with precision, economy and simplicity. It is based on four points: Movement description (instead of position description); Movement decomposition (the diagonal movement is described with horizontal movement and vertical movement separately); Element decomposition (we separate higher eyelid and lower eyelid); Use of a set of symbols rather than words. One symbol can describe many phenomena (with use of colours, height...). First analysis results allow us to define precisely the structure of eye blinking and give the very first ideas for the rules to be designed. All the results must be refined and confirmed by extending the study on the whole corpus. In a second step, our annotation will be used to produce analyses in order to define rules and structure definition of Non Manual Gestures that will be evaluate in LIMSIs automatic French Sign Language generation system.
This paper presents a study on synchronization of linguistic annotation and numerical data on a video corpus of French Sign Language. We detail the methodology and sketches out the potential observations that can be provided by such a kind of mixed annotation. The corpus is composed of three views: close-up, frontal and top. Some image processing has been performed on each video in order to provide global information on the movement of the signers. That consists of the size and position of a bounding box surrounding the signer. Linguists have studied this corpus and have provided annotations on iconic structures, such as "personal transfers" (role shifts). We used an annotation software, ANVIL, to synchronize linguistic annotation and numerical data. This new approach of annotation seems promising for automatic detection of linguistic phenomena, such as classification of the signs according to their size in the signing space, and detection of some iconic structures. Our first results must be consolidated and extended on the whole corpus. The next step will consist of designing automatic processes in order to assist SL annotation.