Sebastian Brarda
2018
Chargrid: Towards Understanding 2D Documents
Anoop R Katti
|
Christian Reisswig
|
Cordula Guder
|
Sebastian Brarda
|
Steffen Bickel
|
Johannes Höhne
|
Jean Baptiste Faddoul
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
We introduce a novel type of text representation that preserves the 2D layout of a document. This is achieved by encoding each document page as a two-dimensional grid of characters. Based on this representation, we present a generic document understanding pipeline for structured documents. This pipeline makes use of a fully convolutional encoder-decoder network that predicts a segmentation mask and bounding boxes. We demonstrate its capabilities on an information extraction task from invoices and show that it significantly outperforms approaches based on sequential text or document images.
2017
Sequential Attention: A Context-Aware Alignment Function for Machine Reading
Sebastian Brarda
|
Philip Yeres
|
Samuel Bowman
Proceedings of the 2nd Workshop on Representation Learning for NLP
In this paper we propose a neural network model with a novel Sequential Attention layer that extends soft attention by assigning weights to words in an input sequence in a way that takes into account not just how well that word matches a query, but how well surrounding words match. We evaluate this approach on the task of reading comprehension (on the Who did What and CNN datasets) and show that it dramatically improves a strong baseline—the Stanford Reader—and is competitive with the state of the art.
Search
Co-authors
- Anoop R Katti 1
- Christian Reisswig 1
- Cordula Guder 1
- Steffen Bickel 1
- Johannes Höhne 1
- show all...