How Much Data Is Enough Data? A New Motion Capture Corpus for Probabilistic Sign Language Generation
Anna Klezovich, Johanna Mesch, Gustav Eje Henter, Jonas Beskow
Abstract
We present a new 4.1 hours long high-quality motion capture sign language dataset for Swedish Sign Language — STS Mocap v1. The dataset consists of high quality multimodal data: body tracked with markers, fingers tracked with Manus Quantum Metagloves, face tracked with iPhone LiveLink app in MetaHuman Animator mode, and corresponding textual sentence translation to spoken Swedish. With the help of this dataset, we show that four hours of motion capture data is enough for generative modeling of sign language conditioned on 2D pose. In comparison, training the same flow-matching model on only 30 minutes of this data, which is a common size for sign language motion capture datasets, shows a significant degradation in the quality of the synthesized data.- Anthology ID:
- 2026.lrec-main.750
- Volume:
- Proceedings of the Fifteenth Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2026
- Address:
- Palma de Mallorca, Spain
- Editors:
- Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
- Venue:
- LREC
- SIG:
- Publisher:
- ELRA Language Resource Association
- Note:
- Pages:
- 9549–9558
- Language:
- URL:
- https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.750/
- DOI:
- Cite (ACL):
- Anna Klezovich, Johanna Mesch, Gustav Eje Henter, and Jonas Beskow. 2026. How Much Data Is Enough Data? A New Motion Capture Corpus for Probabilistic Sign Language Generation. International Conference on Language Resources and Evaluation, main:9549–9558.
- Cite (Informal):
- How Much Data Is Enough Data? A New Motion Capture Corpus for Probabilistic Sign Language Generation (Klezovich et al., LREC 2026)
- PDF:
- https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.750.pdf