Qianchen Xia
2026
Lunar-Bench: Towards Evaluating Task-Oriented Reasoning of LLMs in Lunar Exploration Scenarios
Xin-Yu Xiao | Ye Tian | Erwei Yin | Zhixian He | Shiqi Wang | Yalei Liu | Qianchen Xia
Findings of the Association for Computational Linguistics: ACL 2026
Xin-Yu Xiao | Ye Tian | Erwei Yin | Zhixian He | Shiqi Wang | Yalei Liu | Qianchen Xia
Findings of the Association for Computational Linguistics: ACL 2026
The increasing complexity of lunar exploration calls for intelligent systems capable of supporting autonomous operations and scientific decision-making under uncertain and resource-limited conditions. Advances in large language models (LLMs) create new opportunities for mission planning, but their reliability in dynamic, safety-critical environments remains insufficiently evaluated. Existing benchmarks focus on static, context-independent reasoning tasks and fail to capture the constraints and dependencies of lunar missions. To address this gap, we introduce Lunar-Bench, a benchmark designed to assess the task-oriented reasoning and decision-making performance of LLMs through 3,000 tasks derived from mission procedures and documentation. We further propose the Environmental Scenario Indicators, a process-based framework that evaluates safety, efficiency, integrity, and alignment beyond conventional accuracy. Experiments on 36 representative models show that the best achieves 47.8% accuracy compared with 65.1% for human experts. Lunar-Bench and ESI together provide a principled foundation for developing reliable systems for future missions.
2025
Lunar Twins: We Choose to Go to the Moon with Large Language Models
Xin-Yu Xiao | Yalei Liu | Xiangyu Liu | Zengrui Li | Erwei Yin | Qianchen Xia
Findings of the Association for Computational Linguistics: ACL 2025
Xin-Yu Xiao | Yalei Liu | Xiangyu Liu | Zengrui Li | Erwei Yin | Qianchen Xia
Findings of the Association for Computational Linguistics: ACL 2025
In recent years, the rapid advancement of large language models (LLMs) has significantly reshaped the landscape of scientific research. While LLMs have achieved notable success across various domains, their application in specialized fields such as lunar exploration remains underdeveloped, and their full potential in this domain has yet to be fully realized. To address this gap, we introduce Lunar Twins, the first LLMs designed specifically for lunar exploration, along with a collaborative framework that combines both large and small models. Additionally, we present Lunar GenData, a multi-agent collaborative workflow for generating lunar instructions, and establish the first specialized lunar dataset, which integrates real data from the Chang’e lunar missions. Lastly, we developed Lunar Eval, the first comprehensive evaluation suite for assessing the capabilities of LLMs in lunar exploration tasks. Experimental validation demonstrates that our approach not only enhances domain expertise in lunar exploration but also reveals preliminary indications of embodied intelligence potential.