Zonghao Guo
2026
RSMeM: Knowledge-Enhanced Memory Evolution for Remote Sensing Agents with Systematic Evaluation
Bingxian Wu | Yu Zhang | Zonghao Guo | Tang Liu | Chen Qian | Yuxiang Lu | Xingbo Du | Yanghao Li | Yidan Zhang | Chi Chen | Ling Yao | Chenghu Zhou | Maosong Sun
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Bingxian Wu | Yu Zhang | Zonghao Guo | Tang Liu | Chen Qian | Yuxiang Lu | Xingbo Du | Yanghao Li | Yidan Zhang | Chi Chen | Ling Yao | Chenghu Zhou | Maosong Sun
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Geoscience research requires complex analysis and domain expertise, with remote sensing (RS) observations as a key foundation. However, existing RS agents built on general-purpose LLMs remain largely domain-agnostic, resulting in brittle and error-prone workflows. Moreover, these failures are seldom consolidated into a reusable experience for subsequent analyses. To address this issue, we introduce RSMeM, a knowledge-enhanced memory evolution mechanism that bootstraps RS agents with pre-distilled domain knowledge and iteratively integrates online experience for robust multi-step tool execution. RSMeM is composed of two components: (i) Hierarchical Knowledge Grounding, which performs taxonomy-aware retrieval over a hierarchical domain corpus to guide planning and tool selection; and (ii) Failure-Aware Experience Refinement, which distills failure-annotated tool-use traces into reusable constraints for next-round tool execution. By iteratively employing these two processes, RS agents can evolve to absorb task-level domain knowledge and effectively translate it into instance-level execution experience. Extensive experiments on EarthBench demonstrate that RSMeM consistently improves tool-use performance and end-to-end accuracy across a diverse set of LLM backbones. Notably, RSMeM achieves a 6% accuracy improvement on DeepSeek-V3.2 with less than 1% additional experience tokens, demonstrating the knowledge density of our distilled experience. All codes and models will be released to support reproducible research.
2025
Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
You Li | Heyu Huang | Chi Chen | Kaiyu Huang | Chao Huang | Zonghao Guo | Zhiyuan Liu | Jinan Xu | Yuhua Li | Ruixuan Li | Maosong Sun
Findings of the Association for Computational Linguistics: ACL 2025
You Li | Heyu Huang | Chi Chen | Kaiyu Huang | Chao Huang | Zonghao Guo | Zhiyuan Liu | Jinan Xu | Yuhua Li | Ruixuan Li | Maosong Sun
Findings of the Association for Computational Linguistics: ACL 2025
The recent advancement of Multimodal Large Language Models (MLLMs) has significantly improved their fine-grained perception of single images and general comprehension across multiple images. However, existing MLLMs still face challenges in achieving precise grounding in complex multi-image scenarios. To address this, we first explore a Chain-of-Thought (CoT) framework that integrates single-image grounding with multi-image comprehension. While partially effective, it remains unstable and struggles to capture abstract visual information due to its non-end-to-end nature. Therefore, we introduce Migician, the first multi-image grounding model capable of performing free-form and accurate grounding across multiple images. To support this, we present the MGrounding-630k dataset, which comprises data for several multi-image grounding tasks derived from existing datasets, along with newly generated free-form grounding instruction-following data. Furthermore, we propose MIG-Bench, a comprehensive benchmark specifically designed for evaluating multi-image grounding capabilities. Experimental results demonstrate that our model achieves significantly superior multi-image grounding capabilities, outperforming the best existing MLLMs by 24.94% and even surpassing much larger 70B models. Our code, model, dataset, and benchmark are fully open-sourced at https://migician-vg.github.io/.