Can Models Help Us Create Better Models? Evaluating LLMs as Data Scientists
Michał Pietruszka, Łukasz Borchmann, Aleksander Jędrosz, Paweł Morawiecki
Abstract
We present FeatEng, a novel benchmark designed to evaluate the ability of large language models (LLMs) to perform feature engineering, a critical and knowledge-intensive task in data science. FeatEng assesses LLMs by their capacity to generate Python code that transforms raw tabular data into features that improve the performance of a downstream machine learning model. Our analysis of LLM outputs reveals that success on FeatEng often requires the application of significant world and domain knowledge, along with complex reasoning, to construct novel data representations. While focused on feature engineering, the benchmark probes a confluence of abilities indicative of an LLM’s broader potential for practical, data-centric problem-solving. We demonstrate that FeatEng offers a targeted and efficient approach to assess a specific but crucial aspect of LLM capabilities relevant to real-world data science applications.- Anthology ID:
- 2026.findings-eacl.308
- Volume:
- Findings of the Association for Computational Linguistics: EACL 2026
- Month:
- March
- Year:
- 2026
- Address:
- Rabat, Morocco
- Editors:
- Vera Demberg, Kentaro Inui, Lluís Marquez
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 5862–5886
- Language:
- URL:
- https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.308/
- DOI:
- Cite (ACL):
- Michał Pietruszka, Łukasz Borchmann, Aleksander Jędrosz, and Paweł Morawiecki. 2026. Can Models Help Us Create Better Models? Evaluating LLMs as Data Scientists. In Findings of the Association for Computational Linguistics: EACL 2026, pages 5862–5886, Rabat, Morocco. Association for Computational Linguistics.
- Cite (Informal):
- Can Models Help Us Create Better Models? Evaluating LLMs as Data Scientists (Pietruszka et al., Findings 2026)
- PDF:
- https://preview.aclanthology.org/ingest-eacl/2026.findings-eacl.308.pdf