SIGTYP 2021 Shared Task: Robust Spoken Language Identification
Elizabeth Salesky, Badr M. Abdullah, Sabrina Mielke, Elena Klyachko, Oleg Serikov, Edoardo Maria Ponti, Ritesh Kumar, Ryan Cotterell, Ekaterina Vylomova
Abstract
While language identification is a fundamental speech and language processing task, for many languages and language families it remains a challenging task. For many low-resource and endangered languages this is in part due to resource availability: where larger datasets exist, they may be single-speaker or have different domains than desired application scenarios, demanding a need for domain and speaker-invariant language identification systems. This year’s shared task on robust spoken language identification sought to investigate just this scenario: systems were to be trained on largely single-speaker speech from one domain, but evaluated on data in other domains recorded from speakers under different recording circumstances, mimicking realistic low-resource scenarios. We see that domain and speaker mismatch proves very challenging for current methods which can perform above 95% accuracy in-domain, which domain adaptation can address to some degree, but that these conditions merit further investigation to make spoken language identification accessible in many scenarios.- Anthology ID:
- 2021.sigtyp-1.11
- Volume:
- Proceedings of the Third Workshop on Computational Typology and Multilingual NLP
- Month:
- June
- Year:
- 2021
- Address:
- Online
- Editors:
- Ekaterina Vylomova, Elizabeth Salesky, Sabrina Mielke, Gabriella Lapesa, Ritesh Kumar, Harald Hammarström, Ivan Vulić, Anna Korhonen, Roi Reichart, Edoardo Maria Ponti, Ryan Cotterell
- Venue:
- SIGTYP
- SIG:
- SIGTYP
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 122–129
- Language:
- URL:
- https://preview.aclanthology.org/icon-24-ingestion/2021.sigtyp-1.11/
- DOI:
- 10.18653/v1/2021.sigtyp-1.11
- Cite (ACL):
- Elizabeth Salesky, Badr M. Abdullah, Sabrina Mielke, Elena Klyachko, Oleg Serikov, Edoardo Maria Ponti, Ritesh Kumar, Ryan Cotterell, and Ekaterina Vylomova. 2021. SIGTYP 2021 Shared Task: Robust Spoken Language Identification. In Proceedings of the Third Workshop on Computational Typology and Multilingual NLP, pages 122–129, Online. Association for Computational Linguistics.
- Cite (Informal):
- SIGTYP 2021 Shared Task: Robust Spoken Language Identification (Salesky et al., SIGTYP 2021)
- PDF:
- https://preview.aclanthology.org/icon-24-ingestion/2021.sigtyp-1.11.pdf