Using viseme recognition to improve a sign language translation system

Christoph Schmidt, Oscar Koller, Hermann Ney, Thomas Hoyoux, Justus Piater


Abstract
Sign language-to-text translation systems are similar to spoken language translation systems in that they consist of a recognition phase and a translation phase. First, the video of a person signing is transformed into a transcription of the signs, which is then translated into the text of a spoken language. One distinctive feature of sign languages is their multi-modal nature, as they can express meaning simultaneously via hand movements, body posture and facial expressions. In some sign languages, certain signs are accompanied by mouthings, i.e. the person silently pronounces the word while signing. In this work, we closely integrate a recognition and translation framework by adding a viseme recognizer (“lip reading system”) based on an active appearance model and by optimizing the recognition system to improve the translation output. The system outperforms the standard approach of separate recognition and translation.
Anthology ID:
2013.iwslt-papers.1
Volume:
Proceedings of the 10th International Workshop on Spoken Language Translation: Papers
Month:
December 5-6
Year:
2013
Address:
Heidelberg, Germany
Venue:
IWSLT
SIG:
SIGSLT
Publisher:
Note:
Pages:
Language:
URL:
https://aclanthology.org/2013.iwslt-papers.1
DOI:
Bibkey:
Cite (ACL):
Christoph Schmidt, Oscar Koller, Hermann Ney, Thomas Hoyoux, and Justus Piater. 2013. Using viseme recognition to improve a sign language translation system. In Proceedings of the 10th International Workshop on Spoken Language Translation: Papers, Heidelberg, Germany.
Cite (Informal):
Using viseme recognition to improve a sign language translation system (Schmidt et al., IWSLT 2013)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2013.iwslt-papers.1.pdf