Yashar Moshfeghi


2025

pdf bib
POW: Political Overton Windows of Large Language Models
Leif Azzopardi | Yashar Moshfeghi
Findings of the Association for Computational Linguistics: EMNLP 2025

Political bias in Large Language Models (LLMs) presents a growing concern for the responsible deployment of AI systems. Traditional audits often attempt to locate a model’s political position as a point estimate, masking the broader set of ideological boundaries that shape what a model is willing or unwilling to say. In this paper, we draw upon the concept of the Overton Window as a framework for mapping these boundaries: the range of political views that a given LLM will espouse, remain neutral on, or refuse to endorse. To uncover these windows, we applied an auditing-based methodology, called PRISM, that probes LLMs through task-driven prompts designed to elicit political stances indirectly. Using the Political Compass Test, we evaluated twenty-eight LLMs from eight providers to reveal their distinct Overton Windows. While many models default to economically left and socially liberal positions, we show that their willingness to express or reject certain positions varies considerably, where DeepSeek models tend to be very restrictive in what they will discuss and Gemini models tend to be most expansive. Our findings demonstrate that Overton Windows offer a richer, more nuanced view of political bias in LLMs and provide a new lens for auditing their normative boundaries.

2024

pdf bib
GOLD: Geometry Problem Solver with Natural Language Description
Jiaxin Zhang | Yashar Moshfeghi
Findings of the Association for Computational Linguistics: NAACL 2024

Addressing the challenge of automated geometry math problem-solving in artificial intelligence (AI) involves understanding multi-modal information and mathematics. blackCurrent methods struggle with accurately interpreting geometry diagrams, which hinders effective problem-solving. To tackle this issue, we present the Geometry problem sOlver with natural Language Description (GOLD) model. GOLD enhances the extraction of geometric relations by separately processing symbols and geometric primitives within the diagram. Subsequently, it converts the extracted relations into natural language descriptions, efficiently utilizing large language models to solve geometry math problems. Experiments show that the GOLD model outperforms the Geoformer model, the previous best method on the UniGeo dataset, by achieving accuracy improvements of 12.7% and 42.1% in calculation and proving subsets. Additionally, it surpasses the former best model on the PGPS9K and Geometry3K datasets, PGPSNet, by obtaining accuracy enhancements of 1.8% and 3.2%, respectively.

pdf bib
GeoEval: Benchmark for Evaluating LLMs and Multi-Modal Models on Geometry Problem-Solving
Jiaxin Zhang | Zhong-Zhi Li | Ming-Liang Zhang | Fei Yin | Cheng-Lin Liu | Yashar Moshfeghi
Findings of the Association for Computational Linguistics: ACL 2024

Recent advancements in large language models (LLMs) and multi-modal models (MMs) have demonstrated their remarkable capabilities in problem-solving. Yet, their proficiency in tackling geometry math problems, which necessitates an integrated understanding of both textual and visual information, has not been thoroughly evaluated. To address this gap, we introduce the GeoEval benchmark, a comprehensive collection that includes a main subset of 2,000 problems, a 750 problems subset focusing on backward reasoning, an augmented sub- set of 2,000 problems, and a hard subset of 300 problems. This benchmark facilitates a deeper investigation into the performance of LLMs and MMs in solving geometry math problems. Our evaluation of ten LLMs and MMs across these varied subsets reveals that the WizardMath model excels, achieving a 55.67% accuracy rate on the main subset but only a 6.00% accuracy on the hard subset. This highlights the critical need for testing models against datasets on which they have not been pre-trained. Additionally, our findings indicate that GPT-series models perform more effectively on problems they have rephrased, suggesting a promising method for enhancing model capabilities.