CodeComplex: Dataset for Worst-Case Time Complexity Prediction

SeungYeop Baik, Joonghyuk Hahn, Jungin Kim, Aditi, Mingi Jeon, Yo-Sub Han, Sang-Ki Ko


Abstract
Reasoning ability of large language models (LLMs) is a crucial ability,especially in complex decision-making tasks. One significant task to show LLMs’reasoning capability is code time complexity prediction, which involves variousintricate factors such as the input range of variables and conditional loops.Current benchmarks fall short of providing a rigorous assessment due to limiteddata, language constraints, and insufficient labeling. They do not consider timecomplexity based on input representation and merely evaluate whether predictionsfall into the same class, lacking a measure of how close incorrect predictionsare to the correct ones.To address these dependencies, we introduce CodeComplex, the first robust andextensive dataset designed to evaluate LLMs’ reasoning abilities in predictingcode time complexity. CodeComplex comprises 4,900 Java codes and an equivalentnumber of Python codes, overcoming language and labeling constraints, carefullyannotated with complexity labels based on input characteristics by a panel ofalgorithmic experts. Additionally, we propose specialized evaluation metrics forthe reasoning of complexity prediction tasks, offering a more precise andreliable assessment of LLMs’ reasoning capabilities. We release our dataset andbaseline models publicly to encourage the relevant (NLP, SE, and PL) communitiesto utilize and participate in this research. Our code and data are available athttps://github.com/sybaik1/CodeComplex.
Anthology ID:
2025.findings-emnlp.1069
Volume:
Findings of the Association for Computational Linguistics: EMNLP 2025
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, Violet Peng
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
19616–19638
Language:
URL:
https://preview.aclanthology.org/ingest-luhme/2025.findings-emnlp.1069/
DOI:
10.18653/v1/2025.findings-emnlp.1069
Bibkey:
Cite (ACL):
SeungYeop Baik, Joonghyuk Hahn, Jungin Kim, Aditi, Mingi Jeon, Yo-Sub Han, and Sang-Ki Ko. 2025. CodeComplex: Dataset for Worst-Case Time Complexity Prediction. In Findings of the Association for Computational Linguistics: EMNLP 2025, pages 19616–19638, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
CodeComplex: Dataset for Worst-Case Time Complexity Prediction (Baik et al., Findings 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-luhme/2025.findings-emnlp.1069.pdf
Checklist:
 2025.findings-emnlp.1069.checklist.pdf