Ming Kong
2026
UrbanGeoEval: A City-Scale Benchmark for Evaluating Large Language Models in Geospatial Reasoning
Mutian Bao | Qiuyi Qi | Tian Liang | Jinjian Zhang | Wei Zhou | Ming Kong | Linjian Mo | Qiang Zhu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Mutian Bao | Qiuyi Qi | Tian Liang | Jinjian Zhang | Wei Zhou | Ming Kong | Linjian Mo | Qiang Zhu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Current evaluations of geospatial reasoning in LLMs are frequently impeded by the entanglement of factual recall and spatial logic, which often obscures the models’ true capabilities in complex city-scale environments. To address this, we introduce UrbanGeoEval, a comprehensive benchmark featuring a dual-module framework designed to disentangle these competencies. The Knowledge Module assesses urban memory via scalable map-based queries, while the Reasoning Module isolates pure logical inference across 3,148 realistic tasks by providing necessary geospatial context. Unlike prior benchmarks that hand the model pre-computed spatial text, UrbanGeoEval provides raw geometry and forces the model to act as a spatial computing engine. Our evaluation methodology introduces a reliable hybrid pipeline that merges deterministic programmatic checks with an LLM-as-a-Judge, achieving expert-level evaluation accuracy. Extensive experiments on 18 widely used LLMs uncover critical insights: (1) models exhibit severe geographic biases and resolution gaps; (2) failures in complex multi-hop tasks often stem from brittle foundational spatial skills rather than high-level logic deficits. UrbanGeoEval provides a precise diagnostic tool for advancing urban geospatial intelligence in LLMs.
CARL: Constraint-Aware Reinforcement Learning for Planning with LLMs
Qiuyi Qi | Jinjian Zhang | Mutian Bao | Tian Liang | Guocong Li | Dongnan Liu | Wei Zhou | Jie Liu | Ming Kong | Linjian Mo | Feng Zhang | Qiang Zhu
Findings of the Association for Computational Linguistics: ACL 2026
Qiuyi Qi | Jinjian Zhang | Mutian Bao | Tian Liang | Guocong Li | Dongnan Liu | Wei Zhou | Jie Liu | Ming Kong | Linjian Mo | Feng Zhang | Qiang Zhu
Findings of the Association for Computational Linguistics: ACL 2026
Despite their strong reasoning capabilities and extensive world knowledge, Large Language Models (LLMs) frequently generate plans that violate task constraints, undermining their reliability in real-world applications. This deficiency arises from a lack of systematic mechanisms to incorporate constraint information during the generation process. While existing approaches attempt to mitigate this by relying on external tools or task decomposition, they fail to enhance the model’s intrinsic constraint awareness. To address this, we propose Constraint-Aware Reinforcement Learning (CARL), a novel RL framework designed to strengthen LLMs’ intrinsic focus on constraints. CARL introduces a constraint-aware reward by comparing the model’s output distributions under constrained and unconstrained inputs, encouraging constraint focus and penalizing neglect.Compatible with various RL frameworks and requiring no external solvers or top models, CARL enables scalable, end-to-end constraint-aware planning. Extensive experiments on BlocksWorld, TravelPlanner, and T-Eval demonstrate that CARL significantly outperforms standard Reinforcement Fine-Tuning (RFT) baselines and state-of-the-art reasoning models, exhibiting a markedly increased focus on constraints.
STAPO: Selective Trajectory-Aware Policy Optimization for LLM Agent Training
Qiuyi Qi | Tian Liang | Mutian Bao | Jinjian Zhang | Dongnan Liu | Wei Zhou | Linjian Mo | Ming Kong | Jie Liu | Feng Zhang | Qiang Zhu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Qiuyi Qi | Tian Liang | Mutian Bao | Jinjian Zhang | Dongnan Liu | Wei Zhou | Linjian Mo | Ming Kong | Jie Liu | Feng Zhang | Qiang Zhu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Reinforcement Learning (RL) is the dominant paradigm for training Large Language Model (LLM) agents on long-horizon tasks. However, sparse and delayed rewards often lead to trajectory neglect, in which agents lose focus on the task goal and interaction history at intermediate steps. Prior work has explored step-level supervision using Shannon-entropy–based uncertainty signals, which conflate inherent state complexity with agent confidence and therefore provide unreliable estimates of decision reliability. To address this issue, we propose normalized entropy, which measures confidence deviations relative to an agent’s average behavior under a given state, thereby strengthening the association between low-quality actions and trajectory neglect. Building on this insight, we introduce Selective Trajectory-Aware Policy Optimization (STAPO), a hierarchical group-based RL framework. STAPO leverages normalized entropy to locate outlier steps associated with trajectory neglect and optimizes them via a joint mechanism of trajectory-aware reward and trajectory-independent penalty, enhancing trajectory awareness while preserving training stability. Extensive experiments on ALFWorld, WebShop, and Search-Augmented QA demonstrate that STAPO achieves state-of-the-art performance while substantially alleviating trajectory neglect, validating its effectiveness and robustness for agentic tasks.