Workshop on Sign Language Processing (2025)


up

pdf (full)
bib (full)
Proceedings of the Workshop on Sign Language Processing (WSLP)

pdf bib
Proceedings of the Workshop on Sign Language Processing (WSLP)
Mohammed Hasanuzzaman | Facundo Manuel Quiroga | Ashutosh Modi | Sabyasachi Kamila | Keren Artiaga | Abhinav Joshi | Sanjeet Singh

pdf bib
Overview of the First Workshop on Sign Language Processing (WSLP 2025)
Sanjeet Singh | Abhinav Joshi | Keren Artiaga | Mohammed Hasanuzzaman | Facundo Manuel Quiroga | Sabyasachi Kamila | Ashutosh Modi

We organized the First Workshop on Sign Language Processing (WSLP 2025), co-located with IJCNLP–AACL 2025 at IIT Bombay, to bring together researchers, linguists, and members of the Deaf community and accelerate computational work on under-resourced sign languages.The workshop accepted ten papers—including two official shared-task submissions—that introduced new large-scale resources (a continuous ISL fingerspelling corpus, cross-lingual HamNoSys corpora), advanced multilingual and motion-aware translation models, explored LLM-based augmentation and glossing strategies, and presented lightweight deployable systems for regional languages such as Odia.We ran a three-track shared task on Indian Sign Language that attracted over sixty registered teams and established the first public leaderboards for sentence-level ISL-to-English translation, isolated word recognition, and word-presence prediction.By centring geographic, linguistic, and organiser diversity, releasing open datasets and benchmarks, and explicitly addressing linguistic challenges unique to visual–spatial languages, we significantly broadened the scope of sign-language processing beyond traditionally dominant European and East-Asian datasets, laying a robust foundation for inclusive, equitable, and deployable sign-language AI in the Global South.

pdf bib
Indain Sign Language Recognition and Translation into Odia
Astha Swarupa Nayak | Naisargika Subudhi | Tannushree Rana | Muktikanta Sahu | Rakesh Chandra Balabantaray

Sign language is a vital means of communication for the deaf and hard-of-hearing community. However, translating Indian Sign Language (ISL) into regional languages like Odia remains a significant technological challenge due to the language’s rich morphology, agglutinative grammar, and complex script. This work presents a real-time ISL recognition and translation system that converts hand gestures into Odia text, enhancing accessibility and promoting inclusive communication. The system leverages MediaPipe for real-time key-point detection and uses a custom-built dataset of 1,200 samples across 12 ISL gesture classes, captured under diverse Indian backgrounds and lighting conditions to ensure robustness. Both 2D and 3D Convolutional Neural Networks (CNNs) were explored, with the 2D CNN achieving superior performance 98.33% test accuracy compared to the 3D CNN’s 78.33%. Recognized gestures are translated into Odia using a curated gesture-to-text mapping dictionary, seamlessly integrated into a lightweight Tkinter-based GUI. Unlike other resource-heavy systems, this model is optimized for deployment on low-resource devices, making it suitable for rural and educational contexts. Beyond translation, the system can function as an assistive learning tool for students and educators of ISL. This work demonstrates that combining culturally curated datasets with efficient AI models can bridge communication gaps and create regionally adapted, accessible technology for the deaf and mute community in India.

pdf bib
Low-Resource Sign Language Glossing Profits From Data Augmentation
Diana Vania Lara Ortiz | Sebastian Padó

Glossing is the task of translating from a written language into a sequence of glosses, i.e., textual representations of signs from some sign language. While glossing is in principle ‘just’ a machine translation (MT) task, sign languages still lack the large parallel corpora that exist for many written language pairs and underlie the development of dedicated MT systems. In this work, we demonstrate that glossing can be significantly improved through data augmentation. We fine-tune a Spanish transformer model both on a small dedicated corpus 3,000 Spanish–Mexican Sign Language (MSL) gloss sentence pairs, and on a corpus augmented with an English–American Sign Language (ASL) gloss corpus. We obtain the best results when we oversample from the ASL corpus by a factor of ~4, achieving a BLEU increase from 62 to 85 and a TER reduction from 44 to 20. This demonstrates the usefulness of combining corpora in low-resource glossing situations.

pdf bib
Augmenting Sign Language Translation Datasets with Large Language Models
Pedro Alejandro Dal Bianco | Jean Paul Nunes Reinhold | Facundo Manuel Quiroga | Franco Ronchetti

Sign language translation (SLT) is a challenging task due to the scarcity of labeled data and the heavy-tailed distribution of sign language vocabularies. In this paper, we explore a novel data augmentation approach for SLT: using a large language model (LLM) to generate paraphrases of the target language sentences in the training data. We experiment with a Transformer-based SLT model (Signformer) on three datasets spanning German, Greek, and Argentinian Sign Languages. For models trained with augmentation, we adopt a two-stage regime: pre-train on the LLM-augmented corpus and then fine-tune on the original, non-augmented training set. Our augmented training sets, expanded with GPT-4-generated paraphrases, yield mixed results. On a medium-scale German SL corpus (PHOENIX14T), LLM augmentation improves BLEU-4 from 9.56 to 10.33. In contrast, a small-vocabulary Greek SL dataset with a near-perfect baseline (94.38 BLEU) sees a slight drop to 92.22 BLEU, and a complex Argentinian SL corpus with a long-tail vocabulary distribution remains around 1.2 BLEU despite augmentation. We analyze these outcomes in relation to each dataset’s complexity and token frequency distribution, finding that LLM-based augmentation is more beneficial when the dataset contains a richer vocabulary and many infrequent tokens. To our knowledge, this work is the first to apply LLM paraphrasing to SLT, and we discuss these results with respect to prior data augmentation efforts in sign language translation.

pdf bib
Multilingual Sign Language Translation with Unified Datasets and Pose-Based Transformers
Pedro Alejandro Dal Bianco | Oscar Agustín Stanchi | Facundo Manuel Quiroga | Franco Ronchetti

Sign languages are highly diverse across countries and regions, yet most Sign Language Translation (SLT) work remains monolingual. We explore a unified, multi-target SLT model trained jointly on four sign languages (German, Greek, Argentinian, Indian) using a standardized data layer. Our model operates on pose keypoints extracted with MediaPipe, yielding a lightweight and dataset-agnostic representation that is less sensitive to backgrounds, clothing, cameras, or signer identity while retaining motion and configuration cues. On RWTH-PHOENIX-Weather 2014T, Greek Sign Language Dataset, LSA-T, and ISLTranslate, naive joint training under a fully shared parameterization performs worse than monolingual baselines; however, a simple two-stage schedule: multilingual pre-training followed by a short language-specific fine-tuning, recovers and surpasses monolingual results on three datasets (PHOENIX14T: +0.15 BLEU-4; GSL: +0.74; ISL: +0.10) while narrowing the gap on the most challenging corpus (LSA-T: -0.24 vs. monolingual). Scores span from BLEU-4≈ 1 on open-domain news (LSA-T) to >90 on constrained curricula (GSL), highlighting the role of dataset complexity. We release our code to facilitate training and evaluation of multilingual SLT models.

pdf bib
Continuous Fingerspelling Dataset for Indian Sign Language
Kirandevraj R | Vinod K. Kurmi | Vinay P. Namboodiri | C.v. Jawahar

Fingerspelling enables signers to represent proper nouns and technical terms letter-by-letter using manual alphabets, yet remains severely under-resourced for Indian Sign Language (ISL). We present the first continuous fingerspelling dataset for ISL, extracted from the ISH News YouTube channel, in which fingerspelling is accompanied by synchronized on-screen text cues. The dataset comprises 1,308 segments from 499 videos, totaling 70.85 minutes and 14,814 characters, with aligned video-text pairs capturing authentic coarticulation patterns. We validated the dataset quality through annotation using a proficient ISL interpreter, achieving a 90.67% exact match rate for 150 samples. We further established baseline recognition benchmarks using a ByT5-small encoder-decoder model, which attains 82.91% Character Error Rate after fine-tuning. This resource supports multiple downstream tasks, including fingerspelling transcription, temporal localization, and sign generation. The dataset is available at the following link: https://kirandevraj.github.io/ISL-Fingerspelling/.

pdf bib
Enhancing Indian Sign Language Translation via Motion-Aware Modeling
Anal Roy Chowdhury | Debarshi Kumar Sanyal

Sign language translation (SLT) has witnessed rapid progress in the deep learning community across several sign languages, including German, American, British, and Italian. However, Indian Sign Language (ISL) remains relatively underexplored. Motivated by recent efforts to develop large-scale ISL resources, we investigate how existing SLT models perform on ISL data. Specifically, we evaluate three approaches: (i) training a transformer-based model, (ii) leveraging visual-language pretraining, and (iii) tuning a language model with pre-trained visual and motion representations. Unlike existing methods that primarily use raw video frames, we augment the model with optical flow maps to explicitly capture motion primitives, combined with a multi-scale feature extraction method for encoding spatial features (SpaMo-OF). Our approach achieves promising results, obtaining a BLEU-4 score of 8.58 on the iSign test set, establishing a strong baseline for future ISL translation research.

pdf bib
Pose-Based Temporal Convolutional Networks for Isolated Indian Sign Language Word Recognition
Tatigunta Bhavi Teja Reddy | Vidhya Kamakshi

This paper presents a lightweight and efficient baseline for isolated Indian Sign Language (ISL) word recognition developed forthe WSLP-AACL-2025 Shared Task. Wepropose a two-stage framework combiningskeletal landmark extraction via MediaPipeHolistic with a Temporal Convolutional Network (TCN) for temporal sequence classification. The system processes pose-basedinput sequences instead of raw video, significantly reducing computation and memorycosts. Trained on the WSLP-AACL-2025dataset containing 4,398 isolated sign videosacross 4,361 word classes, our model achieves54% top-1 and 78% top-5 accuracy.

pdf bib
Cross-Linguistic Phonological Similarity Analysis in Sign Languages Using HamNoSys
Abhishek Bharadwaj Varanasi | Manjira Sinha | Tirthankar Dasgupta

This paper presents a cross-linguistic analysis of phonological similarity in sign languages using symbolic representations from the Hamburg Notation System (HamNoSys). We construct a dataset of 1000 signs each from British Sign Language (BSL), German Sign Language (DGS), French Sign Language (LSF), and Greek Sign Language (GSL), and compute pairwise phonological similarity using normalized edit distance over HamNoSys strings. Our analysis reveals both universal and language-specific patterns in handshape usage, movement dynamics, non-manual features, and spatial articulation. We explore intra and inter-language similarity distributions, phonological clustering, and co-occurrence structures across feature types. The findings offer insights into the structural organization of sign language phonology and highlight typological variation shaped by linguistic and cultural factors.

pdf bib
Pose-Based Sign Language Spotting via an End-to-End Encoder Architecture
Samuel Ebimobowei Johnny | Blessed Guda | Emmanuel Aaron | Assane Gueye

Automatic Sign Language Recognition (ASLR) has emerged as a vital field for bridging the gap between deaf and hearing communities. However, the problem of sign-to-sign retrieval or detecting a specific sign within a sequence of continuous signs remains largely unexplored. We define this novel task as Sign Language Spotting. In this paper, we present a first step toward sign language retrieval by addressing the challenge of detecting the presence or absence of a query sign video within a sentence-level gloss or sign video. Unlike conventional approaches that rely on intermediate gloss recognition or text-based matching, we propose an end-to-end model that directly operates on pose keypoints extracted from sign videos. Our architecture employs an encoder-only backbone with a binary classification head to determine whether the query sign appears within the target sequence. By focusing on pose representations instead of raw RGB frames, our method significantly reduces computational cost and mitigates visual noise. We evaluate our approach on the Word Presence Prediction dataset from the WSLP 2025 shared task, achieving 61.88% accuracy and 60.00% F1-score. These results demonstrate the effectiveness of our pose-based framework for Sign Language Spotting, establishing a strong foundation for future research in automatic sign language retrieval and verification.

pdf bib
Finetuning Pre-trained Language Models for Bidirectional Sign Language Gloss to Text Translation
Arshia Kermani | Habib Irani | Vangelis Metsis

Sign Language Translation (SLT) is a crucial technology for fostering communication accessibility for the Deaf and Hard-of-Hearing (DHH) community. A dominant approach in SLT involves a two-stage pipeline: first, transcribing video to sign language glosses, and then translating these glosses into natural text. This second stage, gloss-to-text translation, is a challenging, low-resource machine translation task due to data scarcity and significant syntactic divergence. While prior work has often relied on training translation models from scratch, we show that fine-tuning large, pre-trained language models (PLMs) offers a more effective and data-efficient paradigm. In this work, we conduct a comprehensive bidirectional evaluation of several PLMs (T5, Flan-T5, mBART, and Llama) on this task. We use a collection of popular SLT datasets (RWTH-PHOENIX-14T, SIGNUM, and ASLG-PC12) and evaluate performance using standard machine translation metrics. Our results show that fine-tuned PLMs consistently and significantly outperform Transformer models trained from scratch, establishing new state-of-the-art results. Crucially, our bidirectional analysis reveals a significant performance gap, with Text-to-Gloss translation posing a greater challenge than Gloss-to-Text. We conclude that leveraging the linguistic knowledge of pre-trained models is a superior strategy for gloss translation and provides a more practical foundation for building robust, real-world SLT systems.