Jasper Wilkerson

2026

CUCLASIC at SemEval-2026 Task 5: LLM Prompting Strategies for Rating Ambiguous Word Senses
Federico Ortega Riba | Jasper Wilkerson | Kelsey Lafreniere Adams
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

Word sense disambiguation has been a foundational task in computational semantics since the 1990s, but remains an unsolved problem when it comes to bridging human and computational evaluation of ambiguity. The SemEval-2026 Task 5 attempts to address this gap. We test six Large Language Models (LLMs) from the Llama and Gemini families in order to evaluate LLMs’ ratings of ambiguous textual excerpts, experimenting with zero- and few-shot variants of prompts and analyzing how simple linguistic cues improve performance. We propose a methodology of eliciting human-like ratings from language models by using examples with low and high standard deviations between human ratings. We further evaluate and compare the prediction patterns of different models and how they align with the human generated ratings. Our best model (Gemini 3-Flash) achieves a 75% score combining Spearman correlation and accuracy within one standard deviation.

Co-authors

Venues

SemEval1
WS1

Fix author