GVdoc - Graph-based Visual DOcument Classification

Fnu Mohbat; Mohammed J. Zaki; Catherine Finegan-Dollak; Ashish Verma

doi:10.18653/v1/2023.findings-acl.329

GVdoc - Graph-based Visual DOcument Classification

Fnu Mohbat, Mohammed J Zaki, Catherine Finegan-Dollak, Ashish Verma

Abstract

The robustness of a model for real-world deployment is decided by how well it performs on unseen data and distinguishes between in-domain and out-of-domain samples. Visual document classifiers have shown impressive performance on in-distribution test sets. However, they tend to have a hard time correctly classifying and differentiating out-of-distribution examples. Image-based classifiers lack the text component, whereas multi-modality transformer-based models face the token serialization problem in visual documents due to their diverse layouts. They also require a lot of computing power during inference, making them impractical for many real-world applications. We propose, GVdoc, a graph-based document classification model that addresses both of these challenges. Our approach generates a document graph based on its layout, and then trains a graph neural network to learn node and graph embeddings. Through experiments, we show that our model, even with fewer parameters, outperforms state-of-the-art models on out-of-distribution data while retaining comparable performance on the in-distribution test set.

Anthology ID:: 2023.findings-acl.329
Volume:: Findings of the Association for Computational Linguistics: ACL 2023
Month:: July
Year:: 2023
Address:: Toronto, Canada
Editors:: Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 5342–5357
Language:
URL:: https://aclanthology.org/2023.findings-acl.329
DOI:: 10.18653/v1/2023.findings-acl.329
Bibkey:
Cite (ACL):: Fnu Mohbat, Mohammed J Zaki, Catherine Finegan-Dollak, and Ashish Verma. 2023. GVdoc - Graph-based Visual DOcument Classification. In Findings of the Association for Computational Linguistics: ACL 2023, pages 5342–5357, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):: GVdoc - Graph-based Visual DOcument Classification (Mohbat et al., Findings 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/corrections-2024-04/2023.findings-acl.329.pdf
Video:: https://preview.aclanthology.org/corrections-2024-04/2023.findings-acl.329.mp4

PDF Search Video