Hemolix.TabGen: Optimized Table Generation from Documents

Gyanendra Shrestha; Todor Ivanov; Karthik Vemireddy; Anna Pyayt; Michael Gubanov

Hemolix.TabGen: Optimized Table Generation from Documents

Gyanendra Shrestha, Todor Ivanov, Karthik Vemireddy, Anna Pyayt, Michael Gubanov

Abstract

Modern Data Lakes contain vast and heterogeneous document collections, making table generation from documents a persistent and nontrivial challenge. Traditional approaches are often rigid — i.e. domain-specific, require extensive supervision, or are limited to set of pre-defined schemas; LLM-based approaches are more flexible, but typically suffer from hallucinations, non-determinism, and high computational costs. To overcome these limitations, we introduce Hemolix.TabGen, a novel scalable LLM-based table generation systemthat comprehends documents and generates Bi-dimensional tables based on the entire document content. We evaluated TabGen on 4 publicly available datasets spanning multiple domains and observed an Average Precision delta up to 30% compared to vanilla LLMs

Anthology ID:: 2026.acl-industry.73
Volume:: Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Yunyao Li, Georg Rehm, Mei Tu
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 1055–1066
Language:
URL:: https://preview.aclanthology.org/ingestion-form-platform/2026.acl-industry.73/
DOI:
Bibkey:
Cite (ACL):: Gyanendra Shrestha, Todor Ivanov, Karthik Vemireddy, Anna Pyayt, and Michael Gubanov. 2026. Hemolix.TabGen: Optimized Table Generation from Documents. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track), pages 1055–1066, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: Hemolix.TabGen: Optimized Table Generation from Documents (Shrestha et al., ACL 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingestion-form-platform/2026.acl-industry.73.pdf

PDF Cite Search Fix data