Erwei Yin
2026
Lunar-Bench: Towards Evaluating Task-Oriented Reasoning of LLMs in Lunar Exploration Scenarios
Xin-Yu Xiao | Ye Tian | Erwei Yin | Zhixian He | Shiqi Wang | Yalei Liu | Qianchen Xia
Findings of the Association for Computational Linguistics: ACL 2026
Xin-Yu Xiao | Ye Tian | Erwei Yin | Zhixian He | Shiqi Wang | Yalei Liu | Qianchen Xia
Findings of the Association for Computational Linguistics: ACL 2026
The increasing complexity of lunar exploration calls for intelligent systems capable of supporting autonomous operations and scientific decision-making under uncertain and resource-limited conditions. Advances in large language models (LLMs) create new opportunities for mission planning, but their reliability in dynamic, safety-critical environments remains insufficiently evaluated. Existing benchmarks focus on static, context-independent reasoning tasks and fail to capture the constraints and dependencies of lunar missions. To address this gap, we introduce Lunar-Bench, a benchmark designed to assess the task-oriented reasoning and decision-making performance of LLMs through 3,000 tasks derived from mission procedures and documentation. We further propose the Environmental Scenario Indicators, a process-based framework that evaluates safety, efficiency, integrity, and alignment beyond conventional accuracy. Experiments on 36 representative models show that the best achieves 47.8% accuracy compared with 65.1% for human experts. Lunar-Bench and ESI together provide a principled foundation for developing reliable systems for future missions.
2025
Lunar Twins: We Choose to Go to the Moon with Large Language Models
Xin-Yu Xiao | Yalei Liu | Xiangyu Liu | Zengrui Li | Erwei Yin | Qianchen Xia
Findings of the Association for Computational Linguistics: ACL 2025
Xin-Yu Xiao | Yalei Liu | Xiangyu Liu | Zengrui Li | Erwei Yin | Qianchen Xia
Findings of the Association for Computational Linguistics: ACL 2025
In recent years, the rapid advancement of large language models (LLMs) has significantly reshaped the landscape of scientific research. While LLMs have achieved notable success across various domains, their application in specialized fields such as lunar exploration remains underdeveloped, and their full potential in this domain has yet to be fully realized. To address this gap, we introduce Lunar Twins, the first LLMs designed specifically for lunar exploration, along with a collaborative framework that combines both large and small models. Additionally, we present Lunar GenData, a multi-agent collaborative workflow for generating lunar instructions, and establish the first specialized lunar dataset, which integrates real data from the Chang’e lunar missions. Lastly, we developed Lunar Eval, the first comprehensive evaluation suite for assessing the capabilities of LLMs in lunar exploration tasks. Experimental validation demonstrates that our approach not only enhances domain expertise in lunar exploration but also reveals preliminary indications of embodied intelligence potential.
2024
Landmark-Guided Cross-Speaker Lip Reading with Mutual Information Regularization
Linzhi Wu | Xingyu Zhang | Yakun Zhang | Changyan Zheng | Tiejun Liu | Liang Xie | Ye Yan | Erwei Yin
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Linzhi Wu | Xingyu Zhang | Yakun Zhang | Changyan Zheng | Tiejun Liu | Liang Xie | Ye Yan | Erwei Yin
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Lip reading, the process of interpreting silent speech from visual lip movements, has gained rising attention for its wide range of realistic applications. Deep learning approaches greatly improve current lip reading systems. However, lip reading in cross-speaker scenarios where the speaker identity changes, poses a challenging problem due to inter-speaker variability. A well-trained lip reading system may perform poorly when handling a brand new speaker. To learn a speaker-robust lip reading model, a key insight is to reduce visual variations across speakers, avoiding the model overfitting to specific speakers. In this work, in view of both input visual clues and latent representations based on a hybrid CTC/attention architecture, we propose to exploit the lip landmark-guided fine-grained visual clues instead of frequently-used mouth-cropped images as input features, diminishing speaker-specific appearance characteristics. Furthermore, a max-min mutual information regularization approach is proposed to capture speaker-insensitive latent representations. Experimental evaluations on public lip reading datasets demonstrate the effectiveness of the proposed approach under the intra-speaker and inter-speaker conditions.