Vidur Gupta

2026

CausalMinds at SemEval-2026 Task 12: Simple Fine-Tuning with Option Shuffling Outperforms Complex Pipelines for Abductive Event Reasoning
Vidur Gupta | Xiaofei Zhao | Jason Shaye
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

We describe our system for SemEval-2026 Task 12 on Abductive Event Reasoning, which requires identifying plausible direct cause(s) of real-world events. We conduct a systematic evaluation of 23 configurations spanning prompting, retrieval-augmented generation, multi-stage verification, and supervised fine-tuning across models of different scales. Across experiments, we found that fine-tuning GPT-4.1-mini with data augmentation via option shuffling consistently outperformed more complex multi-stage pipelines and larger-model prompting strategies. Our system scores 0.88 on the test dataset, ranking 19th out of 221 submissions, which is only 0.07 away from the highest scoring submission of 0.95. Interestingly, chain-of-thought prompting and multi-stage verification hurt performance compared to simpler baselines. This reinforces that simplicity can outperform complex pipelines. We document these negative results and examine the persistent gap between development (0.991) and test (0.88) scores.

Co-authors

Jason Shaye 1
Xiaofei Zhao 1

Venues

SemEval1
WS1

Fix author