LLM-Forest: Ensemble Learning of LLMs with Graph-Augmented Prompts for Data Imputation

Xinrui He; Yikun Ban; Jiaru Zou; Tianxin Wei; Curtiss Cook; Jingrui He

LLM-Forest: Ensemble Learning of LLMs with Graph-Augmented Prompts for Data Imputation

Xinrui He, Yikun Ban, Jiaru Zou, Tianxin Wei, Curtiss Cook, Jingrui He

Abstract

Missing data imputation is a critical challenge in various domains, such as healthcare and finance, where data completeness is vital for accurate analysis. Large language models (LLMs), trained on vast corpora, have shown strong potential in data generation, making them a promising tool for data imputation. However, challenges persist in designing effective prompts for a finetuning-free process and in mitigating biases and uncertainty in LLM outputs. To address these issues, we propose a novel framework, LLM-Forest, which introduces a “forest” of few-shot learning LLM “trees” with their outputs aggregated via confidence-based weighted voting based on LLM self-assessment, inspired by the ensemble learning (Random Forest). This framework is established on a new concept of bipartite information graphs to identify high-quality relevant neighboring entries with both feature and value granularity. Extensive experiments on 9 real-world datasets demonstrate the effectiveness and efficiency of LLM-Forest. The implementation is available at https://github.com/Xinrui17/LLM-Forest

Anthology ID:: 2025.findings-acl.361
Volume:: Findings of the Association for Computational Linguistics: ACL 2025
Month:: July
Year:: 2025
Address:: Vienna, Austria
Editors:: Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 6921–6936
Language:
URL:: https://preview.aclanthology.org/landing_page/2025.findings-acl.361/
DOI:
Bibkey:
Cite (ACL):: Xinrui He, Yikun Ban, Jiaru Zou, Tianxin Wei, Curtiss Cook, and Jingrui He. 2025. LLM-Forest: Ensemble Learning of LLMs with Graph-Augmented Prompts for Data Imputation. In Findings of the Association for Computational Linguistics: ACL 2025, pages 6921–6936, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):: LLM-Forest: Ensemble Learning of LLMs with Graph-Augmented Prompts for Data Imputation (He et al., Findings 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/landing_page/2025.findings-acl.361.pdf

PDF Cite Search Fix data