Alexander H. Liu

2025

pdf bib abs
SHuBERT: Self-Supervised Sign Language Representation Learning via Multi-Stream Cluster Prediction
Shester Gueuwou | Xiaodan Du | Greg Shakhnarovich | Karen Livescu | Alexander H. Liu
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Sign language processing has traditionally relied on task-specific models, limiting the potential for transfer learning across tasks. Pre-training methods for sign language have typically focused on either supervised pre-training, which cannot take advantage of unlabeled data, or context-independent (frame or video segment) representations, which ignore the effects of relationships across time in sign language. We introduce SHuBERT (Sign Hidden-Unit BERT), a self-supervised contextual representation model learned from approximately 1,000 hours of American Sign Language video. SHuBERT adapts masked token prediction objectives to multi-stream visual sign language input, learning to predict multiple targets corresponding to clustered hand, face, and body pose streams. SHuBERT achieves state-of-the-art performance across multiple tasks including sign language translation, isolated sign language recognition, and fingerspelling detection.

2020

pdf bib abs
Worse WER, but Better BLEU? Leveraging Word Embedding as Intermediate in Multitask End-to-End Speech Translation
Shun-Po Chuang | Tzu-Wei Sung | Alexander H. Liu | Hung-yi Lee
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Speech translation (ST) aims to learn transformations from speech in the source language to the text in the target language. Previous works show that multitask learning improves the ST performance, in which the recognition decoder generates the text of the source language, and the translation decoder obtains the final translations based on the output of the recognition decoder. Because whether the output of the recognition decoder has the correct semantics is more critical than its accuracy, we propose to improve the multitask ST model by utilizing word embedding as the intermediate.

Co-authors

Greg Shakhnarovich 1

Tzu-Wei Sung 1

Venues

acl2

Fix author