Question Answering over Tabular Data with DataBench: A Large-Scale Empirical Evaluation of LLMs
Jorge Osés Grijalba, L. Alfonso Ureña-López, Eugenio Martínez Cámara, Jose Camacho-Collados
Abstract
Large Language Models (LLMs) are showing emerging abilities, and one of the latest recognized ones deals with their ability to reason and answer questions from tabular data. Although there are some available datasets to assess question answering systems on tabular data, they are not large and diverse enough to properly assess the capabilities of LLMs. To this end, we propose DataBench, a benchmark composed of 65 real-world datasets over several domains, including 20 human-generated questions per dataset, totaling 1300 questions and answers overall. Using this benchmark, we perform a large-scale empirical comparison of several open and closed source models, including both code-generating and in-context learning models. The results highlight the current gap between open-source and closed-source models, with all types of model having room for improvement even in simple boolean questions or involving a single column.- Anthology ID:
- 2024.lrec-main.1179
- Volume:
- Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
- Venues:
- LREC | COLING
- SIG:
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 13471–13488
- Language:
- URL:
- https://aclanthology.org/2024.lrec-main.1179
- DOI:
- Cite (ACL):
- Jorge Osés Grijalba, L. Alfonso Ureña-López, Eugenio Martínez Cámara, and Jose Camacho-Collados. 2024. Question Answering over Tabular Data with DataBench: A Large-Scale Empirical Evaluation of LLMs. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 13471–13488, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- Question Answering over Tabular Data with DataBench: A Large-Scale Empirical Evaluation of LLMs (Osés Grijalba et al., LREC-COLING 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2024.lrec-main.1179.pdf