Segment, Embed, and Align: A Universal Recipe for Aligning Subtitles to Signing
Zifan Jiang, Youngjoon Jang, Liliane Momeni, G\"ul Varol, Sarah Ebling, Andrew Zisserman
Abstract
The goal of this work is to develop a universal approach for aligning subtitles (i.e., spoken language text with corresponding timestamps) to continuous sign language videos. Prior approaches typically rely on end-to-end training tied to a specific language or dataset, which limits their generality. In contrast, our method Segment, Embed, and Align (SEA) provides a single framework that works across multiple languages and domains. SEA leverages two pretrained models: the first to segment a video sequence into individual signs and the second to embed each sign video clip into a shared latent space with text. Alignment is subsequently performed with a lightweight dynamic programming procedure that runs efficiently on CPU within a minute, even for hour-long episodes. SEA is flexible and can adapt to a wide range of scenarios, utilizing resources from small lexicons to large continuous corpora. Experiments on four sign language datasets demonstrate state-of-the-art alignment performance, highlighting the potential of SEA to generate high-quality parallel data for advancing sign language processing.- Anthology ID:
- 2026.acl-long.1401
- Volume:
- Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 30371–30384
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.1401/
- DOI:
- Cite (ACL):
- Zifan Jiang, Youngjoon Jang, Liliane Momeni, G\"ul Varol, Sarah Ebling, and Andrew Zisserman. 2026. Segment, Embed, and Align: A Universal Recipe for Aligning Subtitles to Signing. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 30371–30384, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Segment, Embed, and Align: A Universal Recipe for Aligning Subtitles to Signing (Jiang et al., ACL 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.acl-long.1401.pdf