RelationalCoder: Rethinking Complex Tables via Programmatic Relational Transformation

Haoyu Dong, Yue Hu, Huailiang Peng, Yanan Cao


Abstract
Semi-structured tables, with their varied layouts and formatting artifacts, remain a major obstacle for automated data processing and analytics. To address these challenges, we propose RelationalCoder, which uniformly converts semi-structured tables into relational data, enabling smooth integration with the rich ecosystem of data processing and analytics tools. By leveraging SQL code, RelationalCoder prevents schema errors and markedly improves normalization quality across multiple relational tables.To address the challenge of large tables, we propose a new technique called Loop Reference Decoding (LRD): it identifies expandable groups—repeating regions of similar structure and semantics—and replicates each group using a concise loop over its repetitive region by referencing cell addresses, rather than regenerating each individual cell. This design substantially reduces output length from 𝒪(N × M)—proportional to the table’s height (N) and width (M)—to approximately 𝒪(K), where K is the total number of unique cell types within detected expandable groups. As a result, LRD is highly scalable: the larger the input table, the greater the compression ratio. It scales seamlessly to extremely large tables, achieving output reductions of up to 100,000×.We further create the first human-labeled corpus for table transformation, created with a cost-efficient, actively supervised pipeline. Extensive experiments on HiTab and MultiHiertt show that RelationalCoder not only enables programmatic symbolic reasoning but also boosts QA accuracy—raising Llama-2 and Mistral models by more than 20%, and GPT-4o by over 4%. Project page: https://github.com/haoyudong/RelationalCoder.
Anthology ID:
2025.acl-long.89
Volume:
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
July
Year:
2025
Address:
Vienna, Austria
Editors:
Wanxiang Che, Joyce Nabende, Ekaterina Shutova, Mohammad Taher Pilehvar
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1771–1784
Language:
URL:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.89/
DOI:
Bibkey:
Cite (ACL):
Haoyu Dong, Yue Hu, Huailiang Peng, and Yanan Cao. 2025. RelationalCoder: Rethinking Complex Tables via Programmatic Relational Transformation. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1771–1784, Vienna, Austria. Association for Computational Linguistics.
Cite (Informal):
RelationalCoder: Rethinking Complex Tables via Programmatic Relational Transformation (Dong et al., ACL 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-acl-25/2025.acl-long.89.pdf