TableCoder: Table Extraction from Text via Reliable Code Generation

Haoyu Dong, Yue Hu, Huailiang Peng, Yanan Cao


Abstract
This paper introduces a task aimed at extracting structured tables from text using natural language (NL) instructions. We present TableCoder, an approach that leverages the symbolic nature of code to enhance the robustness of table structure construction and content extraction. TableCoder first generates Python classes or SQL statements to explicitly construct table structures, capturing semantic ontology, computational dependencies, numerical properties, and format strings. This approach reliably mitigates issues such as structural errors, erroneous computations, and mismatched value types. Subsequently, TableCoder proposes grounded content extraction, populating table cells sequentially and maintaining the exact order in which they are mentioned in the source text. By simulating a grounded “translation” from text to code, this method reduces the likelihood of omissions and hallucinations.Experimental results demonstrate that TableCoder significantly improves F1 scores and mitigates hallucination and computational errors, crucial for high-stakes applications like government data analytics and financial compliance reporting. Moreover, the code-generation-based method naturally integrates with standard SQL databases and Python workflows, ensuring seamless deployment in existing enterprise data pipelines.
Anthology ID:
2025.acl-industry.98
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Georg Rehm, Yunyao Li
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1399–1412
Language:
URL:
https://preview.aclanthology.org/display_plenaries/2025.acl-industry.98/
DOI:
Bibkey:
Cite (ACL):
Haoyu Dong, Yue Hu, Huailiang Peng, and Yanan Cao. 2025. TableCoder: Table Extraction from Text via Reliable Code Generation. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track), pages 1399–1412, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
TableCoder: Table Extraction from Text via Reliable Code Generation (Dong et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/display_plenaries/2025.acl-industry.98.pdf