Mauricio Gruppi

Also published as: Maurício Gruppi


2026

This paper describes Mendel292, our system for SemEval-2026 Task 4 on Narrative Story Similarity. We introduce a narrative encoder that decomposes story representations into explicit subspaces for abstract theme, course of action, and outcome, built on a pre-trained sentence embedding model and trainable BiLSTM projection layer with a triplet margin loss objective. We augment the training set via backtranslation, and incorporate weakly supervised multi-task objectives derived from unsupervised narrative clustering.The proposed architecture was designed to learn a latent representation of narratives in a few-shot setting due to a limited amount of traninig data.Despite using a rich pre-trained transformer, the model was outperformed by a unsupervised pooling approach on the classification task.While our systems do not match the top leaderboard scores, they allow us to systematically study the effects of subspace factorization, weak labels, and data augmentation on narrative similarity modeling.

2025

We introduce ConShift, a family of alignment-based algorithms that enable semantic variation analysis at the sense-level. Using independent senses of words induced from the context of tokens in two corpora, sense-enriched word embeddings are aligned using self-supervision and a flexible matching mechanism. This approach makes it possible to test for multiple sense-level language variations such as sense gain/presence, loss/absence and broadening/narrowing, while providing explanation of the changes through visualization of related concepts. We illustrate the utility of the method with sense- and word-level semantic shift detection results for multiple evaluation datasets in diachronic settings and dialect variation in the synchronic setting.
Text-based reinforcement learning involves an agent interacting with a fictional environment using observed text and admissible actions in natural language to complete a task. Previous works have shown that agents can succeed in text-based interactive environments even in the complete absence of semantic understanding or other linguistic capabilities. The success of these agents in playing such games suggests that semantic understanding may not be important for the task. This raises an important question about the benefits of LMs in guiding the agents through the game states. In this work, we show that rich semantic understanding leads to efficient training of text-based RL agents. Moreover, we describe the occurrence of semantic degeneration as a consequence of inappropriate fine-tuning of language models in text-based reinforcement learning (TBRL). Specifically, we describe the shift in the semantic representation of words in the LM, as well as how it affects the performance of the agent in tasks that are semantically similar to the training games. These results may help develop better strategies to fine-tune agents in text-based RL scenarios.

2020

This paper describes SChME (Semantic Change Detection with Model Ensemble), a method used in SemEval-2020 Task 1 on unsupervised detection of lexical semantic change. SChME uses a model ensemble combining signals distributional models (word embeddings) and word frequency where each model casts a vote indicating the probability that a word suffered semantic change according to that feature. More specifically, we combine cosine distance of word vectors combined with a neighborhood-based metric we named Mapped Neighborhood Distance (MAP), and a word frequency differential metric as input signals to our model. Additionally, we explore alignment-based methods to investigate the importance of the landmarks used in this process. Our results show evidence that the number of landmarks used for alignment has a direct impact on the predictive performance of the model. Moreover, we show that languages that suffer less semantic change tend to benefit from using a large number of landmarks, whereas languages with more semantic change benefit from a more careful choice of landmark number for alignment.