StructuralLM: Structural Pre-training for Form Understanding

Chenliang Li, Bin Bi, Ming Yan, Wei Wang, Songfang Huang, Fei Huang, Luo Si


Abstract
Large pre-trained language models achieve state-of-the-art results when fine-tuned on downstream NLP tasks. However, they almost exclusively focus on text-only representation, while neglecting cell-level layout information that is important for form image understanding. In this paper, we propose a new pre-training approach, StructuralLM, to jointly leverage cell and layout information from scanned documents. Specifically, we pre-train StructuralLM with two new designs to make the most of the interactions of cell and layout information: 1) each cell as a semantic unit; 2) classification of cell positions. The pre-trained StructuralLM achieves new state-of-the-art results in different types of downstream tasks, including form understanding (from 78.95 to 85.14), document visual question answering (from 72.59 to 83.94) and document image classification (from 94.43 to 96.08).
Anthology ID:
2021.acl-long.493
Volume:
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Month:
August
Year:
2021
Address:
Online
Editors:
Chengqing Zong, Fei Xia, Wenjie Li, Roberto Navigli
Venues:
ACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
6309–6318
Language:
URL:
https://aclanthology.org/2021.acl-long.493
DOI:
10.18653/v1/2021.acl-long.493
Bibkey:
Cite (ACL):
Chenliang Li, Bin Bi, Ming Yan, Wei Wang, Songfang Huang, Fei Huang, and Luo Si. 2021. StructuralLM: Structural Pre-training for Form Understanding. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 6309–6318, Online. Association for Computational Linguistics.
Cite (Informal):
StructuralLM: Structural Pre-training for Form Understanding (Li et al., ACL-IJCNLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl-2023-videos/2021.acl-long.493.pdf
Video:
 https://preview.aclanthology.org/ingest-acl-2023-videos/2021.acl-long.493.mp4
Code
 alibaba/AliceMind
Data
DocVQAFUNSDRVL-CDIP