Pragyananda Sahoo

2026

Narrative similarity requires reasoning over the deeper structural properties of stories - shared themes, causal progression, and outcomes - rather than surface-level lexical overlap. We describe AI-Monitors, our system for SemEval-2026 Task 4 (Track A), which determines which of two candidate stories is more narratively similar to a given anchor. We explore a progression of approaches - from embedding-based similarity to structured LLM prompting and ensemble construction - guided by four hypotheses about where narrative reasoning gains can be found. The final system achieves 75\% test accuracy on 400 instances, ranking 3rd out of 47 systems and approaching the individual human annotator ceiling of 78\%.Our key findings are: i) structured few-shot prompting substantially outperforms dense embedding similarity; ii) selecting ensemble components by how differently they make errors - rather than by accuracy alone - produces stronger predictions; and iii) how you describe an example to the model affects its predictions.

Co-authors

Vishnu Tripathi 1

Venues

SemEval1
WS1

Fix author