Abstract
This paper attacks the challenging problem of sign language translation (SLT), which involves not only visual and textual understanding but also additional prior knowledge learning (i.e. performing style, syntax). However, the majority of existing methods with vanilla encoder-decoder structures fail to sufficiently explore all of them. Based on this concern, we propose a novel method called Prior knowledge and memory Enriched Transformer (PET) for SLT, which incorporates the auxiliary information into vanilla transformer. Concretely, we develop gated interactive multi-head attention which associates the multimodal representation and global signing style with adaptive gated functions. One Part-of-Speech (POS) sequence generator relies on the associated information to predict the global syntactic structure, which is thereafter leveraged to guide the sentence generation. Besides, considering that the visual-textual context information, and additional auxiliary knowledge of a word may appear in more than one video, we design a multi-stream memory structure to obtain higher-quality translations, which stores the detailed correspondence between a word and its various relevant information, leading to a more comprehensive understanding for each word. We conduct extensive empirical studies on RWTH-PHOENIX-Weather-2014 dataset with both signer-dependent and signer-independent conditions. The quantitative and qualitative experimental results comprehensively reveal the effectiveness of PET.- Anthology ID:
- 2022.findings-acl.297
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2022
- Month:
- May
- Year:
- 2022
- Address:
- Dublin, Ireland
- Editors:
- Smaranda Muresan, Preslav Nakov, Aline Villavicencio
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 3766–3775
- Language:
- URL:
- https://aclanthology.org/2022.findings-acl.297
- DOI:
- 10.18653/v1/2022.findings-acl.297
- Cite (ACL):
- Tao Jin, Zhou Zhao, Meng Zhang, and Xingshan Zeng. 2022. Prior Knowledge and Memory Enriched Transformer for Sign Language Translation. In Findings of the Association for Computational Linguistics: ACL 2022, pages 3766–3775, Dublin, Ireland. Association for Computational Linguistics.
- Cite (Informal):
- Prior Knowledge and Memory Enriched Transformer for Sign Language Translation (Jin et al., Findings 2022)
- PDF:
- https://preview.aclanthology.org/ingest-acl-2023-videos/2022.findings-acl.297.pdf
- Data
- PHOENIX14T, RWTH-PHOENIX-Weather 2014 T