Sergio Ojeda Trueba
2026
GIL-Zaragoza at SemEval 2026 Task 11: Comparing Classification, Autoformalization, and Ontologies for Formal Reasoning Capabilities
Francisco Lopez-Ponce | Lucia Pitarch | Iván Saavedra Martínez | Ignacio Huitzil | Sergio Ojeda Trueba | Fernando Bobillo | Gemma Bel-Enguix
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Francisco Lopez-Ponce | Lucia Pitarch | Iván Saavedra Martínez | Ignacio Huitzil | Sergio Ojeda Trueba | Fernando Bobillo | Gemma Bel-Enguix
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
This paper describes our participation in Task 11 of SemEval-2026, which evaluates the ability of models to determine logical validity of syllogisms independent of real-world content. We develop and compare three approaches for Subtask 1: (1) an encoder-based classification baseline using both classical ML methods and fine-tuned BERT with debiasing strategies; (2) an autoformalization pipeline combining DPO-aligned models with first order logic translation and formal inference via Prover9; and (3) a hybrid neuro-symbolic approach using GPT to generate OWL 2 ontologies evaluated with the HermiT reasoner. Our best result was achieved by the encoder-based classifier, obtaining a 72.25\% accuracy and a combined score of 20.37, placing 40th out of 45 participating teams. Analysis shows that classification methods exhibit lower content bias, autoformalization approaches suffer from translation inconsistencies and syntax incompatibilities, and ontology-based reasoning is hindered by prompt design limitations and verbose serialization formats. All our code can be found in the paper’s repository.