Language ID Prediction from Speech Using Self-Attentive Pooling

Roman Bedyakin; Nikolay Mikhaylovskiy

doi:10.18653/v1/2021.sigtyp-1.12

Language ID Prediction from Speech Using Self-Attentive Pooling

Abstract

This memo describes NTR-TSU submission for SIGTYP 2021 Shared Task on predicting language IDs from speech. Spoken Language Identification (LID) is an important step in a multilingual Automated Speech Recognition (ASR) system pipeline. For many low-resource and endangered languages, only single-speaker recordings may be available, demanding a need for domain and speaker-invariant language ID systems. In this memo, we show that a convolutional neural network with a Self-Attentive Pooling layer shows promising results for the language identification task.

Anthology ID:: 2021.sigtyp-1.12
Volume:: Proceedings of the Third Workshop on Computational Typology and Multilingual NLP
Month:: June
Year:: 2021
Address:: Online
Venue:: SIGTYP
SIG:: SIGTYP
Publisher:: Association for Computational Linguistics
Note:
Pages:: 130–135
Language:
URL:: https://aclanthology.org/2021.sigtyp-1.12
DOI:: 10.18653/v1/2021.sigtyp-1.12
Bibkey:
Cite (ACL):: Roman Bedyakin and Nikolay Mikhaylovskiy. 2021. Language ID Prediction from Speech Using Self-Attentive Pooling. In Proceedings of the Third Workshop on Computational Typology and Multilingual NLP, pages 130–135, Online. Association for Computational Linguistics.
Cite (Informal):: Language ID Prediction from Speech Using Self-Attentive Pooling (Bedyakin & Mikhaylovskiy, SIGTYP 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/auto-file-uploads/2021.sigtyp-1.12.pdf

PDF Search