Raquel G. Alhama


Word Segmentation as Unsupervised Constituency Parsing
Raquel G. Alhama
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Word identification from continuous input is typically viewed as a segmentation task. Experiments with human adults suggest that familiarity with syntactic structures in their native language also influences word identification in artificial languages; however, the relation between syntactic processing and word identification is yet unclear. This work takes one step forward by exploring a radically different approach of word identification, in which segmentation of a continuous input is viewed as a process isomorphic to unsupervised constituency parsing. Besides formalizing the approach, this study reports simulations of human experiments with DIORA (Drozdov et al., 2020), a neural unsupervised constituency parser. Results show that this model can reproduce human behavior in word identification experiments, suggesting that this is a viable approach to study word identification and its relation to syntactic processing.


Retrodiction as Delayed Recurrence: the Case of Adjectives in Italian and English
Raquel G. Alhama | Francesca Zermiani | Atiqah Khaliq
Proceedings of the The 19th Annual Workshop of the Australasian Language Technology Association

We address the question of how to account for both forward and backward dependencies in an online processing account of human language acquisition. We focus on descriptive adjectives in English and Italian, and show that the acquisition of adjectives in these languages likely relies on tracking both forward and backward regularities. Our simulations confirm that forward-predicting models like standard Recurrent Neural Networks (RNN) cannot account for this phenomenon due to the lack of backward prediction, but the addition of a small delay (as proposed in Turek et al., 2019) endows the RNN with the ability to not only predict but also retrodict.


Evaluating Word Embeddings for Language Acquisition
Raquel G. Alhama | Caroline Rowland | Evan Kidd
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics

Continuous vector word representations (or word embeddings) have shown success in capturing semantic relations between words, as evidenced with evaluation against behavioral data of adult performance on semantic tasks (Pereira et al. 2016). Adult semantic knowledge is the endpoint of a language acquisition process; thus, a relevant question is whether these models can also capture emerging word representations of young language learners. However, the data of semantic knowledge of children is scarce or non-existent for some age groups. In this paper, we propose to bridge this gap by using Age of Acquisition norms to evaluate word embeddings learnt from child-directed input. We present two methods that evaluate word embeddings in terms of (a) the semantic neighbourhood density of learnt words, and (b) the convergence to adult word associations. We apply our methods to bag-of-words models, and we find that (1) children acquire words with fewer semantic neighbours earlier, and (2) young learners only attend to very local context. These findings provide converging evidence for validity of our methods in understanding the prerequisite features for a distributional model of word learning.


Neural Discontinuous Constituency Parsing
Miloš Stanojević | Raquel G. Alhama
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

One of the most pressing issues in discontinuous constituency transition-based parsing is that the relevant information for parsing decisions could be located in any part of the stack or the buffer. In this paper, we propose a solution to this problem by replacing the structured perceptron model with a recursive neural model that computes a global representation of the configuration, therefore allowing even the most remote parts of the configuration to influence the parsing decisions. We also provide a detailed analysis of how this representation should be built out of sub-representations of its core elements (words, trees and stack). Additionally, we investigate how different types of swap oracles influence the results. Our model is the first neural discontinuous constituency parser, and it outperforms all the previously published models on three out of four datasets while on the fourth it obtains second place by a tiny difference.


Generalization in Artificial Language Learning: Modelling the Propensity to Generalize
Raquel G. Alhama | Willem Zuidema
Proceedings of the 7th Workshop on Cognitive Aspects of Computational Language Learning