Machine Translation from Spoken Language to Sign Language using Pre-trained Language Model as Encoder

Taro Miyazaki, Yusuke Morita, Masanori Sano


Abstract
Sign language is the first language for those who were born deaf or lost their hearing in early childhood, so such individuals require services provided with sign language. To achieve flexible open-domain services with sign language, machine translations into sign language are needed. Machine translations generally require large-scale training corpora, but there are only small corpora for sign language. To overcome this data-shortage scenario, we developed a method that involves using a pre-trained language model of spoken language as the initial model of the encoder of the machine translation model. We evaluated our method by comparing it to baseline methods, including phrase-based machine translation, using only 130,000 phrase pairs of training data. Our method outperformed the baseline method, and we found that one of the reasons of translation error is from pointing, which is a special feature used in sign language. We also conducted trials to improve the translation quality for pointing. The results are somewhat disappointing, so we believe that there is still room for improving translation quality, especially for pointing.
Anthology ID:
2020.signlang-1.23
Volume:
Proceedings of the LREC2020 9th Workshop on the Representation and Processing of Sign Languages: Sign Language Resources in the Service of the Language Community, Technological Challenges and Application Perspectives
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Eleni Efthimiou, Stavroula-Evita Fotinea, Thomas Hanke, Julie A. Hochgesang, Jette Kristoffersen, Johanna Mesch
Venue:
SignLang
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
139–144
Language:
English
URL:
https://aclanthology.org/2020.signlang-1.23
DOI:
Bibkey:
Cite (ACL):
Taro Miyazaki, Yusuke Morita, and Masanori Sano. 2020. Machine Translation from Spoken Language to Sign Language using Pre-trained Language Model as Encoder. In Proceedings of the LREC2020 9th Workshop on the Representation and Processing of Sign Languages: Sign Language Resources in the Service of the Language Community, Technological Challenges and Application Perspectives, pages 139–144, Marseille, France. European Language Resources Association (ELRA).
Cite (Informal):
Machine Translation from Spoken Language to Sign Language using Pre-trained Language Model as Encoder (Miyazaki et al., SignLang 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-2024-clasp/2020.signlang-1.23.pdf