Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis
Siwen Luo, Yihao Ding, Siqu Long, Josiah Poon, Soyeon Caren Han
Abstract
Recognizing the layout of unstructured digital documents is crucial when parsing the documents into the structured, machine-readable format for downstream applications. Recent studies in Document Layout Analysis usually rely on visual cues to understand documents while ignoring other information, such as contextual information or the relationships between document layout components, which are vital to boost better layout analysis performance. Our Doc-GCN presents an effective way to harmonize and integrate heterogeneous aspects for Document Layout Analysis. We construct different graphs to capture the four main features aspects of document layout components, including syntactic, semantic, density, and appearance features. Then, we apply graph convolutional networks to enhance each aspect of features and apply the node-level pooling for integration. Finally, we concatenate features of all aspects and feed them into the 2-layer MLPs for document layout component classification. Our Doc-GCN achieves state-of-the-art results on three widely used DLA datasets: PubLayNet, FUNSD, and DocBank. The code will be released at https://github.com/adlnlp/doc_gcn- Anthology ID:
- 2022.coling-1.256
- Volume:
- Proceedings of the 29th International Conference on Computational Linguistics
- Month:
- October
- Year:
- 2022
- Address:
- Gyeongju, Republic of Korea
- Editors:
- Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na
- Venue:
- COLING
- SIG:
- Publisher:
- International Committee on Computational Linguistics
- Note:
- Pages:
- 2906–2916
- Language:
- URL:
- https://aclanthology.org/2022.coling-1.256
- DOI:
- Cite (ACL):
- Siwen Luo, Yihao Ding, Siqu Long, Josiah Poon, and Soyeon Caren Han. 2022. Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis. In Proceedings of the 29th International Conference on Computational Linguistics, pages 2906–2916, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
- Cite (Informal):
- Doc-GCN: Heterogeneous Graph Convolutional Networks for Document Layout Analysis (Luo et al., COLING 2022)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2022.coling-1.256.pdf
- Code
- adlnlp/doc_gcn
- Data
- DocBank, FUNSD, PubLayNet