Sebastien Kawada

2026

CascadeMind at SemEval-2026 Task 4: A Hybrid Neuro-Symbolic Cascade for Narrative Similarity
Sebastien Kawada | Dylan Holyoak
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

Across self-consistency samples from an LLM, vote agreement tracks instance difficulty: on SemEval-2026 Task 4 (Narrative Story Similarity), supermajority cases (≥ 7/8 votes) resolve at 85% accuracy, split votes at 67%, and perfect ties at 61%, a monotone gradient that holds across the development set. We exploit this in CascadeMind, which routes eight Gemini 2.5 Flash votes by consensus, escalates split votes to additional sampling rounds, and falls through to a symbolic ensemble of theory-inspired narrative signals only on perfect ties (5% of cases). The system reached 72.75% on Track A test, placing 10th of 44 teams. Ablations show that the symbolic component contributes negligibly end-to-end and that nearly all gains come from confidence-aware routing. The takeaway is methodological: for narrative similarity, calibrating when to spend more compute on a hard instance matters more than adding auxiliary representations to reason about it. Code is available at https://github.com/chreia/CascadeMind-ACL.

pdf bib abs

AsymVerify at SemEval-2026 Task 6: Asymmetric Confidence-Gated Verification for Political Evasion Detection
Sebastien Kawada
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

Political evasion is difficult to detect because evasive answers often appear cooperative while avoiding concrete commitment. We present AsymVerify, a confidence-gated verification system for SemEval-2026 Task 6, a three-way classification of Clear Reply, Ambivalent, and Clear Non-Reply responses. AsymVerify scored 0.85 Macro F1 on the evaluation split (Deval, n=237), placing 2nd out of 41 teams on the official leaderboard. The system first classifies each question-answer pair, then selectively applies downgrade verification (CR/CNR → AMB) or upgrade verification (AMB → CR) to low-confidence predictions. Development analysis shows that errors concentrate at the Ambivalent boundary in both directions, motivating this asymmetric two-verifier design while confidence gating keeps additional inference cost low. On Ddev (n=308), AsymVerify with GLM-4.7 gains +17.1 Macro F1 over single-pass classification at 1.48 calls/example, and the upgrade verifier alone improves every tested LLM backend on Ddev by +6.8 to +15.2 Macro F1 over its single-pass baseline. Code is available at https://github.com/kaons-research/AsymVerify-ACL.

Co-authors

Dylan Holyoak 1

Venues

SemEval2
WS2

Fix author