Word Recognition, Competition, and Activation in a Model of Visually Grounded Speech

William N. Havard; Jean-Pierre Chevrot; Laurent Besacier

doi:10.18653/v1/K19-1032

Word Recognition, Competition, and Activation in a Model of Visually Grounded Speech

William N. Havard, Jean-Pierre Chevrot, Laurent Besacier

Abstract

In this paper, we study how word-like units are represented and activated in a recurrent neural model of visually grounded speech. The model used in our experiments is trained to project an image and its spoken description in a common representation space. We show that a recurrent model trained on spoken sentences implicitly segments its input into word-like units and reliably maps them to their correct visual referents. We introduce a methodology originating from linguistics to analyse the representation learned by neural networks – the gating paradigm – and show that the correct representation of a word is only activated if the network has access to first phoneme of the target word, suggesting that the network does not rely on a global acoustic pattern. Furthermore, we find out that not all speech frames (MFCC vectors in our case) play an equal role in the final encoded representation of a given word, but that some frames have a crucial effect on it. Finally we suggest that word representation could be activated through a process of lexical competition.

Anthology ID:: K19-1032
Volume:: Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)
Month:: November
Year:: 2019
Address:: Hong Kong, China
Editors:: Mohit Bansal, Aline Villavicencio
Venue:: CoNLL
SIG:: SIGNLL
Publisher:: Association for Computational Linguistics
Note:
Pages:: 339–348
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/K19-1032/
DOI:: 10.18653/v1/K19-1032
Bibkey:
Cite (ACL):: William N. Havard, Jean-Pierre Chevrot, and Laurent Besacier. 2019. Word Recognition, Competition, and Activation in a Model of Visually Grounded Speech. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pages 339–348, Hong Kong, China. Association for Computational Linguistics.
Cite (Informal):: Word Recognition, Competition, and Activation in a Model of Visually Grounded Speech (Havard et al., CoNLL 2019)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/K19-1032.pdf
Supplementary material:: K19-1032.Supplementary_Material.pdf
Data: MS COCO

PDF Cite Search Supplementary material Fix data