Lunar-Bench: Towards Evaluating Task-Oriented Reasoning of LLMs in Lunar Exploration Scenarios

Xin-Yu Xiao, Ye Tian, Erwei Yin, Zhixian He, Shiqi Wang, Yalei Liu, Qianchen Xia


Abstract
The increasing complexity of lunar exploration calls for intelligent systems capable of supporting autonomous operations and scientific decision-making under uncertain and resource-limited conditions. Advances in large language models (LLMs) create new opportunities for mission planning, but their reliability in dynamic, safety-critical environments remains insufficiently evaluated. Existing benchmarks focus on static, context-independent reasoning tasks and fail to capture the constraints and dependencies of lunar missions. To address this gap, we introduce Lunar-Bench, a benchmark designed to assess the task-oriented reasoning and decision-making performance of LLMs through 3,000 tasks derived from mission procedures and documentation. We further propose the Environmental Scenario Indicators, a process-based framework that evaluates safety, efficiency, integrity, and alignment beyond conventional accuracy. Experiments on 36 representative models show that the best achieves 47.8% accuracy compared with 65.1% for human experts. Lunar-Bench and ESI together provide a principled foundation for developing reliable systems for future missions.
Anthology ID:
2026.findings-acl.83
Volume:
Findings of the Association for Computational Linguistics: ACL 2026
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1668–1705
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.83/
DOI:
Bibkey:
Cite (ACL):
Xin-Yu Xiao, Ye Tian, Erwei Yin, Zhixian He, Shiqi Wang, Yalei Liu, and Qianchen Xia. 2026. Lunar-Bench: Towards Evaluating Task-Oriented Reasoning of LLMs in Lunar Exploration Scenarios. In Findings of the Association for Computational Linguistics: ACL 2026, pages 1668–1705, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Lunar-Bench: Towards Evaluating Task-Oriented Reasoning of LLMs in Lunar Exploration Scenarios (Xiao et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.findings-acl.83.pdf
Checklist:
 2026.findings-acl.83.checklist.pdf