PECC: Problem Extraction and Coding Challenges

Patrick Haller; Jonas Golde; Alan Akbik

PECC: Problem Extraction and Coding Challenges

Abstract

Recent advancements in large language models (LLMs) have showcased their exceptional abilities across various tasks, such as code generation, problem-solving and reasoning. Existing benchmarks evaluate tasks in isolation, yet the extent to which LLMs can understand prose-style tasks, identify the underlying problems, and then generate appropriate code solutions is still unexplored. Addressing this gap, we introduce PECC, a novel benchmark derived from Advent Of Code (AoC) challenges and Project Euler, including 2396 problems. Unlike conventional benchmarks, PECC requires LLMs to interpret narrative-embedded problems, extract requirements, and generate executable code. A key feature of our dataset is the complexity added by natural language prompting in chat-based evaluations, mirroring real-world instruction ambiguities. Results show varying model performance between narrative and neutral problems, with specific challenges in the Euler math-based subset with GPT-3.5-Turbo passing 50% of the AoC challenges and only 8% on the Euler problems. By probing the limits of LLMs’ capabilities, our benchmark provides a framework to monitor and assess the subsequent progress of LLMs as a universal problem solver.

Anthology ID:: 2024.lrec-main.1111
Volume:: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:: LREC | COLING
SIG:
Publisher:: ELRA and ICCL
Note:
Pages:: 12690–12699
Language:
URL:: https://preview.aclanthology.org/landing_page/2024.lrec-main.1111/
DOI:
Bibkey:
Cite (ACL):: Patrick Haller, Jonas Golde, and Alan Akbik. 2024. PECC: Problem Extraction and Coding Challenges. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 12690–12699, Torino, Italia. ELRA and ICCL.
Cite (Informal):: PECC: Problem Extraction and Coding Challenges (Haller et al., LREC-COLING 2024)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2024.lrec-main.1111.pdf

PDF Cite Search Fix data