Beyond Layout Embedding: Layout Attention with Gaussian Biases for Structured Document Understanding
Xi Zhu, Xue Han, Shuyuan Peng, Shuo Lei, Chao Deng, Junlan Feng
Abstract
Effectively encoding layout information is a central problem in structured document understanding. Most existing methods rely heavily on millions of trainable parameters to learn the layout features of each word from Cartesian coordinates. However, two unresolved questions remain: (1) Is the Cartesian coordinate system the optimal choice for layout modeling? (2) Are massive learnable parameters truly necessary for layout representation? In this paper, we address these questions by proposing Layout Attention with Gaussian Biases (LAGaBi): Firstly, we find that polar coordinates provide a superior choice over Cartesian coordinates as they offer a measurement of both distance and angle between word pairs, capturing relative positions more effectively. Furthermore, by feeding the distances and angles into 2-D Gaussian kernels, we model intuitive inductive layout biases, i.e., the words closer within a document should receive more attention, which will act as the attention biases to revise the textual attention distribution. LAGaBi is model-agnostic and language-independent, which can be applied to a range of transformer-based models, such as the text pre-training models from the BERT series and the LayoutLM series that incorporate visual features. Experimental results on three widely used benchmarks demonstrate that, despite reducing the number of layout parameters from millions to 48, LAGaBi achieves competitive or even superior performance.- Anthology ID:
- 2023.findings-emnlp.521
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2023
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Houda Bouamor, Juan Pino, Kalika Bali
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 7773–7784
- Language:
- URL:
- https://aclanthology.org/2023.findings-emnlp.521
- DOI:
- 10.18653/v1/2023.findings-emnlp.521
- Cite (ACL):
- Xi Zhu, Xue Han, Shuyuan Peng, Shuo Lei, Chao Deng, and Junlan Feng. 2023. Beyond Layout Embedding: Layout Attention with Gaussian Biases for Structured Document Understanding. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 7773–7784, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Beyond Layout Embedding: Layout Attention with Gaussian Biases for Structured Document Understanding (Zhu et al., Findings 2023)
- PDF:
- https://preview.aclanthology.org/naacl24-info/2023.findings-emnlp.521.pdf