Team CV at SemEval-2026 Task 4: Prompting LLMs and Benchmarking Embedding Models for Narrative Story Similarity

Chandan Kumar R S; Vinay Ulli

Team CV at SemEval-2026 Task 4: Prompting LLMs and Benchmarking Embedding Models for Narrative Story Similarity

Abstract

This paper describes Team CV’s systems forSemEval-2026 Task 4: Narrative Story Sim-ilarity and Narrative Representation Learn-ing (Hatzel et al., 2026). For Track A (com-parative judgment), we explore five prompt-ing strategies—zero-shot, chain-of-thought,structured feature extraction, pairwise scor-ing, and few-shot—and QLoRA fine-tuningof smaller models. For Track B (narrativeembeddings), we benchmark twelve dedicatedtext embedding models of varying dimen-sionality (384–4096) spanning open-source(E5-Large-v2, BGE, GTE, Qwen3 Embed-ding) and closed-source (OpenAI, Gemini,Mistral) families, and fine-tune Qwen3 Em-bedding 4B on task-specific triples. Few-shot prompting with Qwen-2.5 7B (64.00%)outperforms all fine-tuned variants (best57.50%) on Track A; scaling to LLaMA-3.3-70B yields 75.00%. On Track B, Ope-nAI text-embedding-3-large (3072-d) achieves the best dev accuracy (67.00%),while fine-tuning Qwen3 Embedding 4B(2560-d) on synthetic triples slightly de-creases accuracy. Our final submission—LLaMA-3.3-70B (3-shot) for Track A andtext-embedding-3-large for Track B—achieves 70.75% and 64.50%, exceeding theGPT-4o-mini and STORY-EMB baselines respec-tively.

Anthology ID:: 2026.semeval-1.374
Volume:: Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Ekaterina Kochmar, Debanjan Ghosh, Kai North, Mamoru Komachi
Venues:: SemEval | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2981–2985
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.374/
DOI:
Bibkey:
Cite (ACL):: Chandan Kumar R S and Vinay Ulli. 2026. Team CV at SemEval-2026 Task 4: Prompting LLMs and Benchmarking Embedding Models for Narrative Story Similarity. In Proceedings of the 20th International Workshop on Semantic Evaluation (2026), pages 2981–2985, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: Team CV at SemEval-2026 Task 4: Prompting LLMs and Benchmarking Embedding Models for Narrative Story Similarity (R S & Ulli, SemEval 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.374.pdf

PDF Cite Search Fix data