SemEval-2026 Task 11: Disentangling Content and Formal Reasoning in Large Language Models

Marco Valentino, Leonardo Ranaldi, Giulia Pucci, Federico Ranaldi, André Freitas


Abstract
SemEval-2026 Task 11 evaluates the ability of Large Language Models (LLMs) to perform content-independent reasoning through a novel multilingual syllogistic dataset designed to measure the "content effect" — the tendency to conflate semantic plausibility with logical validity. The competition featured four subtasks, covering English and multilingual settings with both standard and noisy premise sets. Evaluations of zero-shot baselines reveal that the content effect is pervasive in open models, whereas newer versions demonstrate a significant shift in performance. Across the subtasks, findings indicate that introducing distracting premises can challenge the models’ ability to filter misleading information, while multilingual settings amplify their susceptibility to content biases compared to English. Participants proposed diverse approaches, including neuro-symbolic decomposition, fine-tuning and distillation methods, data augmentation, and activation steering. While explicit symbolic verification remains the most reliable strategy, activation-level interventions and fine-tuning methods offer promising pathways for internalising formal logic within neural architectures. Ultimately, the task reinforces the efficacy of neuro-symbolic approaches and emerging architectural trends for logical reliability, while also highlighting that multilingual setups and longer contexts still pose significant challenges to be investigated in future research.
Anthology ID:
2026.semeval-1.450
Volume:
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Ekaterina Kochmar, Debanjan Ghosh, Kai North, Mamoru Komachi
Venues:
SemEval | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3716–3730
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.450/
DOI:
Bibkey:
Cite (ACL):
Marco Valentino, Leonardo Ranaldi, Giulia Pucci, Federico Ranaldi, and André Freitas. 2026. SemEval-2026 Task 11: Disentangling Content and Formal Reasoning in Large Language Models. In Proceedings of the 20th International Workshop on Semantic Evaluation (2026), pages 3716–3730, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
SemEval-2026 Task 11: Disentangling Content and Formal Reasoning in Large Language Models (Valentino et al., SemEval 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.450.pdf