Haoqi Zhang
2025
Dynamic Feature Fusion for Sign Language Translation Using HyperNetworks
Ruiquan Zhang
|
Rui Zhao
|
Zhicong Wu
|
Liang Zhang
|
Haoqi Zhang
|
Yidong Chen
Findings of the Association for Computational Linguistics: NAACL 2025
This paper presents an efficient dual-stream early fusion method for sign language translation. Inspired by the brain’s ability to process color, shape, and motion simultaneously, the method explores complex dependencies between RGB and keypoint streams, improving speed and efficiency. A key challenge is extracting complementary features from both streams while ensuring global semantic consistency to avoid conflicts and improve generalization. To address this issue, we propose a hypernetwork-based fusion strategy that effectively extracts salient features from RGB and keypoint streams, alongside a partial shortcut connection training method to strengthen the complementary information between the dual streams. Additionally, we introduce self-distillation and SST contrastive learning to maintain feature advantages while aligning the global semantic space. Experiments show that our method achieves state-of-the-art performance on two public sign language datasets, reducing model parameters by about two-thirds.