Lawal Abdulmujeeb


2026

We present our solution system for SemEval-2026 Task 1-Subtask A, a humor generation task requiring systems to generate jokes, given either a news headline or word-pair inputs. Our approach used the Llama-3.1-8B-Instruct model and we selected this model after comparing several candidate models and humor strategies across our experiments. For the headline inputs, we used a two-shot prompt to frame the output as a tweet and specifying the tone proved to be a particularly important factor in output quality. As for the word-pair inputs, we instructed the model to commit to an everyday situation and generate a funny thought based on that. Also, while experimenting, we noticed that models would start a joke one way with the first word and abruptly shift context mid-joke just to include the second word, and committing to a single situation helped handle that. We also made use of personas here, specifically using Dave Chappelle. Our final system shared 2nd place with 3 other systems out of 32 total systems and achieved an Elo score of 1020. Achieving these results, with no fine-tuning, suggests that careful prompt design alone can yield competitive results.

2025

Correctly identifying idiomatic expressions remains a major challenge in Natural Language Processing (NLP), as these expressions often have meanings that cannot be directly inferred from their individual words. The SemEval-2025 Task 1 introduces two subtasks, A and B, designed to test models’ ability to interpret idioms using multimodal data, including both text and images. This paper focuses on Subtask A, where the goal is to determine which among several images best represents the intended meaning of an idiomatic expression in a given sentence.To address this, we employed a two-stage approach. First, we used GPT-4o to analyze sentences, extracting relevant keywords and sentiments to better understand the idiomatic usage. This processed information was then passed to a CLIP-VIT model, which ranked the available images based on their relevance to the idiomatic expression. Our results showed that this approach performed significantly better than directly feeding sentences and idiomatic compounds into the models without preprocessing. Specifically, our method achieved a Top-1 accuracy of 0.67 in English, whereas performance in Portuguese was notably lower at 0.23. These findings highlight both the promise of multimodal approaches for idiom interpretation and the challenges posed by language-specific differences in model performance.