GR1: Reinforcement-Enhanced LLM for Geoscience Reasoning

Yule Xie, Jiaxin Ding, Cheng Deng, Shiqing Gao, Junran Zhang, Sibo Zhang, Zeyuan Wang, Ke Wu, Xin Ding, Luoyi Fu, Meng Jin, Xinbing Wang


Abstract
Reinforcement learning (RL) has recently shown remarkable ability to enhance reasoning in large language models (LLMs), yet its potential in scientific domains beyond mathematics remains largely unexplored. Geoscience questions couple broad factual knowledge with multi-step inference and often rely on visual evidence such as maps, cross-sections, and diagrams, making them a challenging but verifiable testbed for RL-based reasoning. To enable this study, we introduce GeoMC-10K, a dataset of 10,000 geoscience multiple-choice questions spanning physical to human geography and high-school to professional levels; over 30% of the questions are image dependent. To support text-only RL on these multimodal questions, we design GeoM2T, a multi-agent framework that converts multimodal questions into descriptive text while preserving answerability and difficulty. Fine-tuning LLaMA-3.1-8B and Qwen-3-8B with Group Relative Policy Optimization (GRPO), incorporating a factual reward mechanism, yields GR1, which achieves absolute accuracy improvements of 5.9% and 13.3%, respectively, and it generalizes to out-of-distribution geoscience benchmarks. Together, GeoMC-10K, GeoM2T, and GR1 establish a scalable benchmark and baseline for RL-enhanced geoscience reasoning.
Anthology ID:
2026.findings-acl.1140
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
22730–22743
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1140/
DOI:
Bibkey:
Cite (ACL):
Yule Xie, Jiaxin Ding, Cheng Deng, Shiqing Gao, Junran Zhang, Sibo Zhang, Zeyuan Wang, Ke Wu, Xin Ding, Luoyi Fu, Meng Jin, and Xinbing Wang. 2026. GR1: Reinforcement-Enhanced LLM for Geoscience Reasoning. In Findings of the Association for Computational Linguistics: ACL 2026, pages 22730–22743, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
GR1: Reinforcement-Enhanced LLM for Geoscience Reasoning (Xie et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.1140.pdf
Checklist:
 2026.findings-acl.1140.checklist.pdf