Lunar-Bench: Towards Evaluating Task-Oriented Reasoning of LLMs in Lunar Exploration Scenarios
Xin-Yu Xiao, Ye Tian, Erwei Yin, Zhixian He, Shiqi Wang, Yalei Liu, Qianchen Xia
Abstract
The increasing complexity of lunar exploration calls for intelligent systems capable of supporting autonomous operations and scientific decision-making under uncertain and resource-limited conditions. Advances in large language models (LLMs) create new opportunities for mission planning, but their reliability in dynamic, safety-critical environments remains insufficiently evaluated. Existing benchmarks focus on static, context-independent reasoning tasks and fail to capture the constraints and dependencies of lunar missions. To address this gap, we introduce Lunar-Bench, a benchmark designed to assess the task-oriented reasoning and decision-making performance of LLMs through 3,000 tasks derived from mission procedures and documentation. We further propose the Environmental Scenario Indicators, a process-based framework that evaluates safety, efficiency, integrity, and alignment beyond conventional accuracy. Experiments on 36 representative models show that the best achieves 47.8% accuracy compared with 65.1% for human experts. Lunar-Bench and ESI together provide a principled foundation for developing reliable systems for future missions.- Anthology ID:
- 2026.findings-acl.83
- Volume:
- Findings of the Association for Computational Linguistics: ACL 2026
- Month:
- July
- Year:
- 2026
- Address:
- San Diego, California, United States
- Editors:
- Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1668–1705
- Language:
- URL:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.83/
- DOI:
- Cite (ACL):
- Xin-Yu Xiao, Ye Tian, Erwei Yin, Zhixian He, Shiqi Wang, Yalei Liu, and Qianchen Xia. 2026. Lunar-Bench: Towards Evaluating Task-Oriented Reasoning of LLMs in Lunar Exploration Scenarios. In Findings of the Association for Computational Linguistics: ACL 2026, pages 1668–1705, San Diego, California, United States. Association for Computational Linguistics.
- Cite (Informal):
- Lunar-Bench: Towards Evaluating Task-Oriented Reasoning of LLMs in Lunar Exploration Scenarios (Xiao et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-acl/2026.findings-acl.83.pdf