Sydelle De Souza


2025

pdf bib
Behavioural vs. Representational Systematicity in End-to-End Models: An Opinionated Survey
Ivan Vegner | Sydelle De Souza | Valentin Forch | Martha Lewis | Leonidas A. A. Doumas
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

A core aspect of compositionality, systematicity is a desirable property in ML models as it enables strong generalization to novel contexts. This has led to numerous studies proposing benchmarks to assess systematic generalization, as well as models and training regimes designed to enhance it. Many of these efforts are framed as addressing the challenge posed by Fodor and Pylyshyn. However, while they argue for systematicity of representations, existing benchmarks and models primarily focus on the systematicity of behaviour. We emphasize the crucial nature of this distinction. Furthermore, building on Hadley’s (1994) taxonomy of systematic generalization, we analyze the extent to which behavioural systematicity is tested by key benchmarks in the literature across language and vision. Finally, we highlight ways of assessing systematicity of representations in ML models as practiced in the field of mechanistic interpretability.

pdf bib
What does memory retrieval leave on the table? Modelling the Cost of Semi-Compositionality with MINERVA2 and sBERT
Sydelle De Souza | Ivan Vegner | Francis Mollica | Leonidas A. A. Doumas
Proceedings of the 29th Conference on Computational Natural Language Learning

Despite being ubiquitous in natural language, collocations (e.g., kick+habit) incur a unique processing cost, compared to compositional phrases (kick+door) and idioms (kick+bucket). We confirm this cost with behavioural data as well as MINERVA2, a memory model, suggesting that collocations constitute a distinct linguistic category. While the model fails to fully capture the observed human processing patterns, we find that below a specific item frequency threshold, the model’s retrieval failures align with human reaction times across conditions. This suggests an alternative processing mechanism that activates when memory retrieval fails.

2024

pdf bib
Semantics and Sentiment: Cross-lingual Variations in Emoji Use
Giulio Zhou | Sydelle De Souza | Ella Markham | Oghenetekevwe Kwakpovwe | Sumin Zhao
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Over the past decade, the use of emojis in social media has seen a rapid increase. Despite their popularity and image-grounded nature, previous studies have found that people interpret emojis inconsistently when presented in context and in isolation. In this work, we explore whether emoji semantics differ across languages and how semantics interacts with sentiment in emoji use across languages. To do so, we developed a corpus containing the literal meanings for a set of emojis, as defined by L1 speakers in English, Portuguese and Chinese. We then use these definitions to assess whether speakers of different languages agree on whether an emoji is being used literally or figuratively in the context where they are grounded in, as well as whether this literal and figurative use correlates with the sentiment of the context itself. We found that there were varying levels of disagreement on the definition for each emoji but that these stayed fairly consistent across languages. We also demonstrated a correlation between the sentiment of a tweet and the figurative use of an emoji, providing theoretical underpinnings for empirical results in NLP tasks, particularly offering insights that can benefit sentiment analysis models.