This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
OnnoCrasborn
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
We describe a sign language documentation project funded by the Endangered Languages Documentation Project (ELDP) in the province of Kermanshah, a city in west of Iran. The deposit at ELDP archive (elararchive.org) includes recording of 38 native signers of Zaban Eshareh Irani living in Kermanshah. The recordings start with an elicitation of the signs of the Farsi alphabet along with fingerspelling of some words as well as vocabulary elicitation of some basic concepts. Subsequently, the participants are asked to watch short movies and then they are asked to retell the story. Later, the participants have natural conversations in pairs guided by a deaf moderator. Initial annotations of ID-glosses and translations to Persian and English were also archived. ID-glosses are stored as a dataset in Global Signbank, along with a citation form of signs and their phonological description. The resulting datasets and one-hour annotation of the conversations are available to other researchers in ELDP archive.
For developing sign language technologies like automatic translation, huge amounts of training data are required. Even the larger corpora available for some sign languages are tiny compared to the amounts of data used for corresponding spoken language technologies. The overarching goal of the European project EASIER is to develop a framework for bidirectional automatic translation between sign and spoken languages and between sign languages. One part of this multi-dimensional project is that it will pool available language resources from European sign languages into a larger dataset to address the data scarcity problem. This approach promises to open the floor for lower-resourced sign languages in Europe. This article focusses on efforts in the EASIER project to allow for new languages to make use of such technologies in the future. What are the characteristics of sign language resources needed to train recognition, translation, and synthesis algorithms, and how can other countries including those without any sign resources follow along with these developments? The efforts undertaken in EASIER include creating workflow documents and organizing training sessions in online workshops. They reflect the current state of the art, and will likely need to be updated in the coming decade.
Lexicostatistics is the main method used in previous work measuring linguistic distances between sign languages. As a method, it disregards any possible structural/grammatical similarity, instead focusing exclusively on lexical items, but it is time consuming as it requires some comparable phonological coding (i.e. form description) as well as concept matching (i.e. meaning description) of signs across the sign languages to be compared. In this paper, we present a novel approach for measuring lexical similarity across any two sign languages using the Global Signbank platform, a lexical database of uniformly coded signs. The method involves a feature-by-feature comparison of all matched phonological features. This method can be used in two distinct ways: 1) automatically comparing the amount of lexical overlap between two sign languages (with a more detailed feature-description than previous lexicostatistical methods); 2) finding exact form-matches across languages that are either matched or mismatched in meaning (i.e. true or false friends). We show the feasability of this method by comparing three languages (datasets) in Global Signbank, and are currently expanding both the size of these three as well as the total number of datasets.
This paper discusses some improvements in recent and planned versions of the multimodal annotation tool ELAN, which are targeted at improving the usability of annotated files. Increased support for multilingual documents is provided, by allowing for multilingual vocabularies and by specifying a language per document, annotation layer (tier) or annotation. In addition, improvements in the search possibilities and the display of the results have been implemented, which are especially relevant in the interpretation of the results of complex multi-tier searches.
The Sign Linguistics Corpora Network is a three-year network initiative that aims to collect existing knowledge and practices on the creation and use of signed language resources. The concrete goals are to organise a series of four workshops in 2009 and 2010, create a stable Internet location for such knowledge, and generate new ideas for employing the most recent technologies for the study of signed languages. The network covers a wide range of subjects: data collection, metadata, annotation, and exploitation; these are the topics of the four workshops. The outcomes of the first two workshops are summarised in this paper; both workshops demonstrated that the need for dedicated knowledge on sign language corpora is especially salient in countries where researchers work alone or in small groups, which is still quite common in many places in Europe. While the original goal of the network was primarily to focus on corpus linguistics and language documentation, human language technology has gradually been incorporated as a user group of signed language resources.
The SignSpeak project will be the first step to approach sign language recognition and translation at a scientific level already reached in similar research fields such as automatic speech recognition or statistical machine translation of spoken languages. Deaf communities revolve around sign languages as they are their natural means of communication. Although deaf, hard of hearing and hearing signers can communicate without problems amongst themselves, there is a serious challenge for the deaf community in trying to integrate into educational, social and work environments. The overall goal of SignSpeak is to develop a new vision-based technology for recognizing and translating continuous sign language to text. New knowledge about the nature of sign language structure from the perspective of machine recognition of continuous sign language will allow a subsequent breakthrough in the development of a new vision-based technology for continuous sign language recognition and translation. Existing and new publicly available corpora will be used to evaluate the research progress throughout the whole project.