Can Models Help Us Create Better Models? Evaluating LLMs as Data Scientists

Michał Pietruszka, Łukasz Borchmann, Aleksander Jędrosz, Paweł Morawiecki


Abstract
We present FeatEng, a novel benchmark designed to evaluate the ability of large language models (LLMs) to perform feature engineering, a critical and knowledge-intensive task in data science. FeatEng assesses LLMs by their capacity to generate Python code that transforms raw tabular data into features that improve the performance of a downstream machine learning model. Our analysis of LLM outputs reveals that success on FeatEng often requires the application of significant world and domain knowledge, along with complex reasoning, to construct novel data representations. While focused on feature engineering, the benchmark probes a confluence of abilities indicative of an LLM’s broader potential for practical, data-centric problem-solving. We demonstrate that FeatEng offers a targeted and efficient approach to assess a specific but crucial aspect of LLM capabilities relevant to real-world data science applications.
Anthology ID:
2026.findings-eacl.308
Volume:
Findings of the Association for Computational Linguistics: EACL 2026
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5862–5886
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.308/
DOI:
Bibkey:
Cite (ACL):
Michał Pietruszka, Łukasz Borchmann, Aleksander Jędrosz, and Paweł Morawiecki. 2026. Can Models Help Us Create Better Models? Evaluating LLMs as Data Scientists. In Findings of the Association for Computational Linguistics: EACL 2026, pages 5862–5886, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
Can Models Help Us Create Better Models? Evaluating LLMs as Data Scientists (Pietruszka et al., Findings 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.308.pdf
Checklist:
 2026.findings-eacl.308.checklist.pdf