GR1: Reinforcement-Enhanced LLM for Geoscience Reasoning
Yule Xie, Jiaxin Ding, Cheng Deng, Shiqing Gao, Junran Zhang, Sibo Zhang, Zeyuan Wang, Ke Wu, Xin Ding, Luoyi Fu, Meng Jin, Xinbing Wang
Abstract
Reinforcement learning (RL) has recently shown remarkable ability to enhance reasoning in large language models (LLMs), yet its potential in scientific domains beyond mathematics remains largely unexplored. Geoscience questions couple broad factual knowledge with multi-step inference and often rely on visual evidence such as maps, cross-sections, and diagrams, making them a challenging but verifiable testbed for RL-based reasoning. To enable this study, we introduce GeoMC-10K, a dataset of 10,000 geoscience multiple-choice questions spanning physical to human geography and high-school to professional levels; over 30% of the questions are image dependent. To support text-only RL on these multimodal questions, we design GeoM2T, a multi-agent framework that converts multimodal questions into descriptive text while preserving answerability and difficulty. Fine-tuning LLaMA-3.1-8B and Qwen-3-8B with Group Relative Policy Optimization (GRPO), incorporating a factual reward mechanism, yields GR1, which achieves absolute accuracy improvements of 5.9% and 13.3%, respectively, and it generalizes to out-of-distribution geoscience benchmarks. Together, GeoMC-10K, GeoM2T, and GR1 establish a scalable benchmark and baseline for RL-enhanced geoscience reasoning.- Anthology ID:
- 2026.findings-acl.1140
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 22730–22743
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1140/
- DOI:
- Cite (ACL):
- Yule Xie, Jiaxin Ding, Cheng Deng, Shiqing Gao, Junran Zhang, Sibo Zhang, Zeyuan Wang, Ke Wu, Xin Ding, Luoyi Fu, Meng Jin, and Xinbing Wang. 2026. GR1: Reinforcement-Enhanced LLM for Geoscience Reasoning. In Findings of the Association for Computational Linguistics: ACL 2026, pages 22730–22743, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- GR1: Reinforcement-Enhanced LLM for Geoscience Reasoning (Xie et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1140.pdf