Korean Disaster Safety Information Sign Language Translation Benchmark Dataset

Wooyoung Kim, TaeYong Kim, Byeongjin Kim, Myeong Jin MJ Lee, Gitaek Lee, Kirok Kim, Jisoo Cha, Wooju Kim


Abstract
Sign language is a crucial means of communication for deaf communities. However, those outside deaf communities often lack understanding of sign language, leading to inadequate communication accessibility for the deaf. Therefore, sign language translation is a significantly important research area. In this context, we present a new benchmark dataset for Korean sign language translation named SSL:korean disaster Safety information Sign Language translation benchmark dataset. Korean sign language translation datasets provided by the National Information Society Agency in South Korea have faced challenges related to computational resources, heterogeneity between train and test sets, and unrefined data. To alleviate the aforementioned issue, we refine the origin data and release them. Additionally, we report experimental results of baseline using a transformer architecture. We empirically demonstrate that the baseline performance varies depending on the tokenization method applied to gloss sequences. In particular, tokenization based on characteristics of sign language outperforms tokenization considering characteristics of spoken language and tokenization utilizing statistical techniques. We release materials at our https://github.com/SSL-Sign-Language/Korean-Disaster-Safety-Information-Sign-Language-Translation-Benchmark-Dataset
Anthology ID:
2024.lrec-main.869
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
9948–9953
Language:
URL:
https://aclanthology.org/2024.lrec-main.869
DOI:
Bibkey:
Cite (ACL):
Wooyoung Kim, TaeYong Kim, Byeongjin Kim, Myeong Jin MJ Lee, Gitaek Lee, Kirok Kim, Jisoo Cha, and Wooju Kim. 2024. Korean Disaster Safety Information Sign Language Translation Benchmark Dataset. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 9948–9953, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Korean Disaster Safety Information Sign Language Translation Benchmark Dataset (Kim et al., LREC-COLING 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/2024.lrec-main.869.pdf