Humanity’s Last Code Exam: Can Advanced LLMs Conquer Human’s Hardest Code Competition?

Xiangyang Li; Xiaopeng Li; Kuicai Dong; Zhangquanhu; Rongju Ruan; Xinyi Dai; Yasheng Wang; Ruiming Tang

doi:10.18653/v1/2025.findings-emnlp.1152

Humanity’s Last Code Exam: Can Advanced LLMs Conquer Human’s Hardest Code Competition?

Xiangyang Li, Xiaopeng Li, Kuicai Dong, Zhangquanhu, Rongju Ruan, Xinyi Dai, Yasheng Wang, Ruiming Tang

Abstract

Code generation is a core capability of large language models (LLMs), yet mainstream benchmarks (e.g., APPs and LiveCodeBench) contain questions with medium-level difficulty and pose no challenge to advanced LLMs. To better reflected the advanced reasoning and code generation ability, We introduce Humanity’s Last Code Exam (HLCE), comprising 235 most challenging problems from the International Collegiate Programming Contest (ICPC World Finals) and the International Olympiad in Informatics (IOI) spanning 2010 – 2024. As part of HLCE, we design a harmonized online–offline sandbox that guarantees fully reproducible evaluation. Through our comprehensive evaluation, we observe that even the strongest reasoning LLMs: o4-mini(high) and Gemini-2.5 Pro, achieve pass@1 rates of only 15.9% and 11.4%, respectively. Meanwhile, we propose a novel “self-recognition” task to measure LLMs’ awareness of their own capabilities. Results indicate that LLMs’ self-recognition abilities are not proportionally correlated with their code generation performance. Finally, our empirical validation of test-time scaling laws reveals that current advanced LLMs have substantial room for improvement on complex programming tasks. We expect HLCE to become a milestone challenge for code generation and to catalyze advances in high-performance reasoning and human–AI collaborative programming. Our code and dataset are also public available¹.https://github.com/Humanity-s-Last-Code-Exam/HLCE

Anthology ID:: 2025.findings-emnlp.1152
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2025
Month:: November
Year:: 2025
Address:: Suzhou, China
Editors:: Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 21122–21137
Language:
URL:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1152/
DOI:: 10.18653/v1/2025.findings-emnlp.1152
Bibkey:
Cite (ACL):: Xiangyang Li, Xiaopeng Li, Kuicai Dong, Zhangquanhu, Rongju Ruan, Xinyi Dai, Yasheng Wang, and Ruiming Tang. 2025. Humanity’s Last Code Exam: Can Advanced LLMs Conquer Human’s Hardest Code Competition?. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 21122–21137, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):: Humanity’s Last Code Exam: Can Advanced LLMs Conquer Human’s Hardest Code Competition? (Li et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/author-page-yu-wang-polytechnic/2025.findings-emnlp.1152.pdf
Checklist:: 2025.findings-emnlp.1152.checklist.pdf

PDF Cite Search Checklist Fix data