Enhanced Table Structure Recognition with Multi-Modal Approach

Huichen Yang, Andrew D. Hellicar, Maciej Rybinski, Sarvnaz Karimi


Abstract
Tables are fundamental for presenting information in research articles, technical documents, manuals, and reports. One key challenge is accessing the information in tables that are embedded in Portable Document Format (PDF) files or scanned images. It requires accurately recognising table structures in diverse table layouts and complex tables. The Table Structure Recognition (TSR) task aims to recognise the internal structure of table images and convert them into a machine-readable format. We propose a flexible multi-modal framework for image-based TSR. Our approach employs two-stream transformer encoders alongside task-specific decoders for table structure extraction and cell bounding box detection. Experiments on benchmark datasets demonstrate that our model achieves highly competitive results compared to strong baselines, gaining 5.4% over single-modality approaches on the FinTabNetd dataset.
Anthology ID:
2025.wasp-main.23
Volume:
Proceedings of the Third Workshop for Artificial Intelligence for Scientific Publications
Month:
December
Year:
2025
Address:
Mumbai, India and virtual
Editors:
Alberto Accomazzi, Tirthankar Ghosal, Felix Grezes, Kelly Lockhart
Venues:
WASP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
201–207
Language:
URL:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.wasp-main.23/
DOI:
Bibkey:
Cite (ACL):
Huichen Yang, Andrew D. Hellicar, Maciej Rybinski, and Sarvnaz Karimi. 2025. Enhanced Table Structure Recognition with Multi-Modal Approach. In Proceedings of the Third Workshop for Artificial Intelligence for Scientific Publications, pages 201–207, Mumbai, India and virtual. Association for Computational Linguistics.
Cite (Informal):
Enhanced Table Structure Recognition with Multi-Modal Approach (Yang et al., WASP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.wasp-main.23.pdf