Can Multi-turn Self-refined Single Agent LMs with Retrieval Solve Hard Coding Problems?

Md Tanzib Hosain; Md Kishor Morol

Can Multi-turn Self-refined Single Agent LMs with Retrieval Solve Hard Coding Problems?

Abstract

Among the hardest tasks for humans are those found in competitive programming where problems require sophisticated algorithmic thinking, puzzle solving, and the creation of effective code. As a domain to assess language models (LMs), it has not received enough attention, though. This study presents the ICPC benchmark, which consists of 254 international collegiate programming contest (ICPC) tasks. Each problem includes official analysis, reference code, and sample and high-quality unit and hidden tests. We are able to develop and evaluate a variety of LM inference techniques for competitive programming with these resources. With zero-shot chain-of-thought prompting, we find that o1 only achieves a 19.1% pass@1 solve rate. With our best inference technique, which combines muti-turn self-judge with reflection and retrieval over episodic information, raises this to 42.2%. Furthermore, we conduct a new human-in-the-loop investigation to gain a deeper understanding of the remaining difficulties. Surprisingly, we discover that o1 can solve 17 out of 18 problems that were previously unsolvable by any model or technique with just a few specific instructions. A footstep toward LMs with grounded, imaginative, and algorithmic thinking is provided by our quantitative findings and qualitative research. We open source our code at https://github.com/kraritt/zolve.

Anthology ID:: 2025.acl-srw.8
Volume:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Jin Zhao, Mingyang Wang, Zhu Liu
Venues:: ACL | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 129–142
Language:
URL:: https://preview.aclanthology.org/landing_page/2025.acl-srw.8/
DOI:
Bibkey:
Cite (ACL):: Md Tanzib Hosain and Md Kishor Morol. 2025. Can Multi-turn Self-refined Single Agent LMs with Retrieval Solve Hard Coding Problems?. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop), pages 129–142, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: Can Multi-turn Self-refined Single Agent LMs with Retrieval Solve Hard Coding Problems? (Hosain & Morol, ACL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2025.acl-srw.8.pdf

PDF Cite Search Fix data