KoSign Sign Language Translation Project: Introducing The NIASL2021 Dataset
Mathew Huerta-Enochian, Du Hui Lee, Hye Jin Myung, Kang Suk Byun, Jun Woo Lee
Abstract
We introduce a new sign language production (SLP) and sign language translation (SLT) dataset, NIASL2021, consisting of 201,026 Korean-KSL data pairs. KSL translations of Korean source texts are represented in three formats: video recordings, keypoint position data, and time-aligned gloss annotations for each hand (using a 7,989 sign vocabulary) and for eight different non-manual signals (NMS). We evaluated our sign language elicitation methodology and found that text-based prompting had a negative effect on translation quality in terms of naturalness and comprehension. We recommend distilling text into a visual medium before translating into sign language or adding a prompt-blind review step to text-based translation methodologies.- Anthology ID:
- 2022.sltat-1.9
- Volume:
- Proceedings of the 7th International Workshop on Sign Language Translation and Avatar Technology: The Junction of the Visual and the Textual: Challenges and Perspectives
- Month:
- June
- Year:
- 2022
- Address:
- Marseille, France
- Venue:
- SLTAT
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 59–66
- Language:
- URL:
- https://aclanthology.org/2022.sltat-1.9
- DOI:
- Cite (ACL):
- Mathew Huerta-Enochian, Du Hui Lee, Hye Jin Myung, Kang Suk Byun, and Jun Woo Lee. 2022. KoSign Sign Language Translation Project: Introducing The NIASL2021 Dataset. In Proceedings of the 7th International Workshop on Sign Language Translation and Avatar Technology: The Junction of the Visual and the Textual: Challenges and Perspectives, pages 59–66, Marseille, France. European Language Resources Association.
- Cite (Informal):
- KoSign Sign Language Translation Project: Introducing The NIASL2021 Dataset (Huerta-Enochian et al., SLTAT 2022)
- PDF:
- https://preview.aclanthology.org/remove-xml-comments/2022.sltat-1.9.pdf
- Data
- How2Sign, RWTH-PHOENIX-Weather 2014 T