BSL-Hansard: A parallel, multimodal corpus of English and interpreted British Sign Language data from parliamentary proceedings

Euan McGill, Horacio Saggion


Abstract
BSL-Hansard is a novel open source and multimodal resource composed by combining Sign Language video data in BSL and English text from the official transcription of British parliamentary sessions. This paper describes the method followed to compile BSL-Hansard including time alignment of text using the MAUS (Schiel, 2015) segmentation system, gives some statistics about this dataset, and suggests experiments. These primarily include end-to-end Sign Language-to-text translation, but is also relevant for broader machine translation, and speech and language processing tasks.
Anthology ID:
2023.at4ssl-1.5
Volume:
Proceedings of the Second International Workshop on Automatic Translation for Signed and Spoken Languages
Month:
June
Year:
2023
Address:
Tampere, Finland
Editors:
Dimitar Shterionov, Mirella De Sisto, Mathias Muller, Davy Van Landuyt, Rehana Omardeen, Shaun Oboyle, Annelies Braffort, Floris Roelofsen, Fred Blain, Bram Vanroy, Eleftherios Avramidis
Venue:
AT4SSL
SIG:
Publisher:
European Association for Machine Translation
Note:
Pages:
38–43
Language:
URL:
https://aclanthology.org/2023.at4ssl-1.5
DOI:
Bibkey:
Cite (ACL):
Euan McGill and Horacio Saggion. 2023. BSL-Hansard: A parallel, multimodal corpus of English and interpreted British Sign Language data from parliamentary proceedings. In Proceedings of the Second International Workshop on Automatic Translation for Signed and Spoken Languages, pages 38–43, Tampere, Finland. European Association for Machine Translation.
Cite (Informal):
BSL-Hansard: A parallel, multimodal corpus of English and interpreted British Sign Language data from parliamentary proceedings (McGill & Saggion, AT4SSL 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2023.at4ssl-1.5.pdf