Recurrent babbling: evaluating the acquisition of grammar from limited input data

Ludovica Pannitto, Aurélie Herbelot


Abstract
Recurrent Neural Networks (RNNs) have been shown to capture various aspects of syntax from raw linguistic input. In most previous experiments, however, learning happens over unrealistic corpora, which do not reflect the type and amount of data a child would be exposed to. This paper remedies this state of affairs by training an LSTM over a realistically sized subset of child-directed input. The behaviour of the network is analysed over time using a novel methodology which consists in quantifying the level of grammatical abstraction in the model’s generated output (its ‘babbling’), compared to the language it has been exposed to. We show that the LSTM indeed abstracts new structures as learning proceeds.
Anthology ID:
2020.conll-1.13
Volume:
Proceedings of the 24th Conference on Computational Natural Language Learning
Month:
November
Year:
2020
Address:
Online
Venue:
CoNLL
SIG:
SIGNLL
Publisher:
Association for Computational Linguistics
Note:
Pages:
165–176
Language:
URL:
https://aclanthology.org/2020.conll-1.13
DOI:
10.18653/v1/2020.conll-1.13
Bibkey:
Cite (ACL):
Ludovica Pannitto and Aurélie Herbelot. 2020. Recurrent babbling: evaluating the acquisition of grammar from limited input data. In Proceedings of the 24th Conference on Computational Natural Language Learning, pages 165–176, Online. Association for Computational Linguistics.
Cite (Informal):
Recurrent babbling: evaluating the acquisition of grammar from limited input data (Pannitto & Herbelot, CoNLL 2020)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2020.conll-1.13.pdf
Optional supplementary material:
 2020.conll-1.13.OptionalSupplementaryMaterial.zip
Data
OpenSubtitles