GIL-Zaragoza at SemEval 2026 Task 11: Comparing Classification, Autoformalization, and Ontologies for Formal Reasoning Capabilities

Francisco Lopez-Ponce, Lucia Pitarch, Iván Saavedra Martínez, Ignacio Huitzil, Sergio Ojeda Trueba, Fernando Bobillo, Gemma Bel-Enguix


Abstract
This paper describes our participation in Task 11 of SemEval-2026, which evaluates the ability of models to determine logical validity of syllogisms independent of real-world content. We develop and compare three approaches for Subtask 1: (1) an encoder-based classification baseline using both classical ML methods and fine-tuned BERT with debiasing strategies; (2) an autoformalization pipeline combining DPO-aligned models with first order logic translation and formal inference via Prover9; and (3) a hybrid neuro-symbolic approach using GPT to generate OWL 2 ontologies evaluated with the HermiT reasoner. Our best result was achieved by the encoder-based classifier, obtaining a 72.25\% accuracy and a combined score of 20.37, placing 40th out of 45 participating teams. Analysis shows that classification methods exhibit lower content bias, autoformalization approaches suffer from translation inconsistencies and syntax incompatibilities, and ontology-based reasoning is hindered by prompt design limitations and verbose serialization formats. All our code can be found in the paper’s repository.
Anthology ID:
2026.semeval-1.308
Volume:
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Ekaterina Kochmar, Debanjan Ghosh, Kai North, Mamoru Komachi
Venues:
SemEval | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2438–2446
Language:
URL:
https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.308/
DOI:
Bibkey:
Cite (ACL):
Francisco Lopez-Ponce, Lucia Pitarch, Iván Saavedra Martínez, Ignacio Huitzil, Sergio Ojeda Trueba, Fernando Bobillo, and Gemma Bel-Enguix. 2026. GIL-Zaragoza at SemEval 2026 Task 11: Comparing Classification, Autoformalization, and Ontologies for Formal Reasoning Capabilities. In Proceedings of the 20th International Workshop on Semantic Evaluation (2026), pages 2438–2446, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
GIL-Zaragoza at SemEval 2026 Task 11: Comparing Classification, Autoformalization, and Ontologies for Formal Reasoning Capabilities (Lopez-Ponce et al., SemEval 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-workshops/2026.semeval-1.308.pdf