On The Performance of Time-Pooling Strategies for End-to-End Spoken Language Identification

Joao Monteiro, Md Jahangir Alam, Tiago Falk


Abstract
Automatic speech processing applications often have to deal with the problem of aggregating local descriptors (i.e., representations of input speech data corresponding to specific portions across the time dimension) and turning them into a single fixed-dimension representation, known as global descriptor, on top of which downstream classification tasks can be performed. In this paper, we provide an empirical assessment of different time pooling strategies when used with state-of-the-art representation learning models. In particular, insights are provided as to when it is suitable to use simple statistics of local descriptors or when more sophisticated approaches are needed. Here, language identification is used as a case study and a database containing ten oriental languages under varying test conditions (short-duration test recordings, confusing languages, unseen languages) is used. Experiments are performed with classifiers trained on top of global descriptors to provide insights on open-set evaluation performance and show that appropriate selection of such pooling strategies yield embeddings able to outperform well-known benchmark systems as well as previously results based on attention only.
Anthology ID:
2020.lrec-1.438
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
3566–3572
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.438
DOI:
Bibkey:
Cite (ACL):
Joao Monteiro, Md Jahangir Alam, and Tiago Falk. 2020. On The Performance of Time-Pooling Strategies for End-to-End Spoken Language Identification. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 3566–3572, Marseille, France. European Language Resources Association.
Cite (Informal):
On The Performance of Time-Pooling Strategies for End-to-End Spoken Language Identification (Monteiro et al., LREC 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-2023-videos/2020.lrec-1.438.pdf