Chargrid: Towards Understanding 2D Documents
Anoop R Katti, Christian Reisswig, Cordula Guder, Sebastian Brarda, Steffen Bickel, Johannes Höhne, Jean Baptiste Faddoul
Abstract
We introduce a novel type of text representation that preserves the 2D layout of a document. This is achieved by encoding each document page as a two-dimensional grid of characters. Based on this representation, we present a generic document understanding pipeline for structured documents. This pipeline makes use of a fully convolutional encoder-decoder network that predicts a segmentation mask and bounding boxes. We demonstrate its capabilities on an information extraction task from invoices and show that it significantly outperforms approaches based on sequential text or document images.- Anthology ID:
- D18-1476
- Volume:
- Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
- Month:
- October-November
- Year:
- 2018
- Address:
- Brussels, Belgium
- Editors:
- Ellen Riloff, David Chiang, Julia Hockenmaier, Jun’ichi Tsujii
- Venue:
- EMNLP
- SIG:
- SIGDAT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 4459–4469
- Language:
- URL:
- https://aclanthology.org/D18-1476
- DOI:
- 10.18653/v1/D18-1476
- Cite (ACL):
- Anoop R Katti, Christian Reisswig, Cordula Guder, Sebastian Brarda, Steffen Bickel, Johannes Höhne, and Jean Baptiste Faddoul. 2018. Chargrid: Towards Understanding 2D Documents. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4459–4469, Brussels, Belgium. Association for Computational Linguistics.
- Cite (Informal):
- Chargrid: Towards Understanding 2D Documents (Katti et al., EMNLP 2018)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/D18-1476.pdf
- Code
- additional community code