CoReTab: Improving Multimodal Table Understanding with Code-driven Reasoning

Van-Quang Nguyen, Takayuki Okatani


Abstract
Existing datasets for multimodal table understanding, such as MMTab, primarily provide short factual answers without explicit multi-step reasoning supervision. Models trained on these datasets often generate brief responses that offers insufficient accuracy and limited interpretability into how these models arrive at the final answer. We introduce CoReTab, a code-driven reasoning framework that produces scalable, interpretable, and automatically verifiable annotations by coupling multi-step reasoning with executable Python code. Using the CoReTab framework, we curate a dataset of 115K verified samples averaging 529 tokens per response and fine-tune open-source MLLMs through a three-stage pipeline. We evaluate the resulting model trained on CoReTab across 17 MMTab benchmarks spanning table question answering, fact verification, and table structure understanding. Our model achieves significant gains of +6.2%, +5.7%, and +25.6%, respectively, over MMTab-trained baselines, while producing transparent and verifiable reasoning traces. These results establish CoReTab as a robust and generalizable supervision framework for improving multi-step reasoning in multimodal table understanding.
Anthology ID:
2026.eacl-long.306
Volume:
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6498–6523
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-long.306/
DOI:
Bibkey:
Cite (ACL):
Van-Quang Nguyen and Takayuki Okatani. 2026. CoReTab: Improving Multimodal Table Understanding with Code-driven Reasoning. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6498–6523, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
CoReTab: Improving Multimodal Table Understanding with Code-driven Reasoning (Nguyen & Okatani, EACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-long.306.pdf