Francisco Fernando Lopez-Ponce

Also published as: Francisco Fernando Lopez-Ponce

2025

pdf bib abs
Into The Limits of Logic: Alignment Methods for Formal Logical Reasoning
Francisco Fernando Lopez-Ponce | Gemma Bel-Enguix
Proceedings of The 3rd Workshop on Mathematical Natural Language Processing (MathNLP 2025)

We implement Large Language Model Alignment algorithms to formal logic reasoning tasks involving natural-language (NL) to first-order logic (FOL) translation, formal logic inference, and premise retranslation. These methodologies were implemented using task-specific preference datasets created based on the FOLIO datasets and LLM generations. Alignment was based on DPO, this algorithm was implemented and tested on off-the-shelf and pre-aligned models, showing promising results for higher quality NL-FOL parsing, as well as general alignment strategies. In addition, we introduce a new similarity metric (LogicSim) between LLM-generated responses and gold standard values, that measures logic-relevant information such as premise count and overlap between answers and expands evaluation of NL-FOL translation pipelines. Our results show that LLMs still struggle with logical inference, however alignment benefits semantic parsing and retranslation of results from formal logic to natural language.

2024

pdf bib abs
WikiBias as an Extrapolation Corpus for Bias Detection
K. Salas-Jimenez | Francisco Fernando Lopez-Ponce | Sergio-Luis Ojeda-Trueba | Gemma Bel-Enguix
Proceedings of the First Workshop on Advancing Natural Language Processing for Wikipedia

This paper explores whether it is possible to train a machine learning model using Wikipedia data to detect subjectivity in sentences and generalize effectively to other domains. To achieve this, we performed experiments with the WikiBias corpus, the BABE corpus, and the CheckThat! Dataset. Various classical models for ML were tested, including Logistic Regression, SVC, and SVR, including characteristics such as Sentence Transformers similarity, probabilistic sentiment measures, and biased lexicons. Pre-trained models like DistilRoBERTa, as well as large language models like Gemma and GPT-4, were also tested for the same classification task.

Co-authors

Venues

Fix author