VerbaNexAI at SemEval-2026 Task 5: Few-Shot Chain-of-Thought with Selective Self-Consistency and Isotonic Calibration for Word Sense Plausibility Rating

Daniel Peña Gnecco, Edwin Puertas, Juan Carlos Martinez Santos, Jairo Serrano


Abstract
We present a system for rating word sense plausibility in ambiguous narrative contexts for SemEval-2026 Task 5. Our approach ensembles three large language models (Llama-3.1 70B, Qwen-2.5 32B, and Gemma-2 27B) using a computationally efficient, uncertainty-aware pipeline. We combine few-shot chain-of-thought prompting with selective self-consistency, which applies stochastic multiple sampling exclusively to items identified as inherently ambiguous. This targeted strategy reduces inference costs by approximately 45% while maintaining robustness in predictions. To correct the systematic bias of LLMs toward extreme ratings, we apply isotonic regression to shift the output distribution toward patterns of human judgment. Our system achieves a Spearman correlation of 0.67 and an accuracy within 0.76 standard deviations, ranking 34th out of 79 participating teams (top 43% without task-specific fine-tuning). Detailed error analysis reveals that while our system performs strongly on clear contexts (ρ = 0.78), current prompting paradigms struggle significantly to model multimodal human disagreement in genuinely ambiguous cases (ρ = 0.58), highlighting an important challenge for future work on subjective semantic tasks.
Anthology ID:
2026.semeval-1.190
Volume:
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Ekaterina Kochmar, Debanjan Ghosh, Kai North, Mamoru Komachi
Venues:
SemEval | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1469–1476
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.190/
DOI:
Bibkey:
Cite (ACL):
Daniel Peña Gnecco, Edwin Puertas, Juan Carlos Martinez Santos, and Jairo Serrano. 2026. VerbaNexAI at SemEval-2026 Task 5: Few-Shot Chain-of-Thought with Selective Self-Consistency and Isotonic Calibration for Word Sense Plausibility Rating. In Proceedings of the 20th International Workshop on Semantic Evaluation (2026), pages 1469–1476, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
VerbaNexAI at SemEval-2026 Task 5: Few-Shot Chain-of-Thought with Selective Self-Consistency and Isotonic Calibration for Word Sense Plausibility Rating (Peña Gnecco et al., SemEval 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.190.pdf
Supplementarymaterial:
 2026.semeval-1.190.SupplementaryMaterial.zip