Anna Pyayt
2026
Hemolix.TabGen: Optimized Table Generation from Documents
Gyanendra Shrestha | Todor Ivanov | Karthik Vemireddy | Anna Pyayt | Michael Gubanov
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Gyanendra Shrestha | Todor Ivanov | Karthik Vemireddy | Anna Pyayt | Michael Gubanov
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Modern Data Lakes contain vast and heterogeneous document collections, making table generation from documents a persistent and nontrivial challenge. Traditional approaches are often rigid — i.e. domain-specific, require extensive supervision, or are limited to set of pre-defined schemas; LLM-based approaches are more flexible, but typically suffer from hallucinations, non-determinism, and high computational costs. To overcome these limitations, we introduce Hemolix.TabGen, a novel scalable LLM-based table generation systemthat comprehends documents and generates Bi-dimensional tables based on the entire document content. We evaluated TabGen on 4 publicly available datasets spanning multiple domains and observed an Average Precision delta up to 30% compared to vanilla LLMs