Shashikala Kankanamge
2026
Tübingen-CL at SemEval-2026 Task 12: Reinforcement Learning and Verification for Abductive Reasoning
Bolun Liang | Ayperi Khudaybergenova | Shashikala Kankanamge
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Bolun Liang | Ayperi Khudaybergenova | Shashikala Kankanamge
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
We investigate the reliability of verifier-based pipelines for abductive reasoning in SemEval-2026 Task 12. While reinforcement learning improves the base generator’s performance, we find that incorporating a small-model verifier introduces a significant generalization gap: although effective on validation data, the verifier systematically degrades correct predictions on the unseen test set by appending false positives. Furthermore, we reveal a critical vulnerability in the official evaluation metric, which assigns zero reward to abstentions but does not sufficiently penalize incorrect selections. This asymmetry enables trivial heuristic strategies such as blindly selecting a default option to substantially inflate performance, even outperforming more principled reasoning systems. Our analysis demonstrates that current evaluation protocols can misrepresent true reasoning ability and highlights the need for more robust verification methods and scoring schemes.