COGNAC at SemEval-2026 Task 5: LLM Ensembles for Human-Level Word Sense Plausibility Rating in Challenging Narratives

Azwad Anjum Islam, Tisa Islam Erana


Abstract
We present a system for SemEval-2026 Task 5 that predicts 1–5 plausibility ratings for candidate senses of homonyms in ambiguous short stories using prompting with closed-source LLMs. We evaluate three prompting strategies: zero-shot, chain-of-thought, and comparative prompting that jointly scores competing senses. We also find simple unweighted ensembling better aligns with subjective human judgments better than individual models. Our official submission ranked 4th on the leaderboard with an average score of 0.86, with post-competition experiments improving performance to 0.89.
Anthology ID:
2026.semeval-1.414
Volume:
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Ekaterina Kochmar, Debanjan Ghosh, Kai North, Mamoru Komachi
Venues:
SemEval | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3328–3336
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.414/
DOI:
Bibkey:
Cite (ACL):
Azwad Anjum Islam and Tisa Islam Erana. 2026. COGNAC at SemEval-2026 Task 5: LLM Ensembles for Human-Level Word Sense Plausibility Rating in Challenging Narratives. In Proceedings of the 20th International Workshop on Semantic Evaluation (2026), pages 3328–3336, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
COGNAC at SemEval-2026 Task 5: LLM Ensembles for Human-Level Word Sense Plausibility Rating in Challenging Narratives (Islam & Erana, SemEval 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.414.pdf
Supplementarymaterial:
 2026.semeval-1.414.SupplementaryMaterial.zip