Contrastive Document Representation Learning with Graph Attention Networks

Peng Xu; Xinchi Chen; Xiaofei Ma; Zhiheng Huang; Bing Xiang

doi:10.18653/v1/2021.findings-emnlp.327

Contrastive Document Representation Learning with Graph Attention Networks

Peng Xu, Xinchi Chen, Xiaofei Ma, Zhiheng Huang, Bing Xiang

Abstract

Recent progress in pretrained Transformer-based language models has shown great success in learning contextual representation of text. However, due to the quadratic self-attention complexity, most of the pretrained Transformers models can only handle relatively short text. It is still a challenge when it comes to modeling very long documents. In this work, we propose to use a graph attention network on top of the available pretrained Transformers model to learn document embeddings. This graph attention network allows us to leverage the high-level semantic structure of the document. In addition, based on our graph document model, we design a simple contrastive learning strategy to pretrain our models on a large amount of unlabeled corpus. Empirically, we demonstrate the effectiveness of our approaches in document classification and document retrieval tasks.

Anthology ID:: 2021.findings-emnlp.327
Volume:: Findings of the Association for Computational Linguistics: EMNLP 2021
Month:: November
Year:: 2021
Address:: Punta Cana, Dominican Republic
Editors:: Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Venue:: Findings
SIG:: SIGDAT
Publisher:: Association for Computational Linguistics
Note:
Pages:: 3874–3884
Language:
URL:: https://preview.aclanthology.org/fix-sig-urls/2021.findings-emnlp.327/
DOI:: 10.18653/v1/2021.findings-emnlp.327
Bibkey:
Cite (ACL):: Peng Xu, Xinchi Chen, Xiaofei Ma, Zhiheng Huang, and Bing Xiang. 2021. Contrastive Document Representation Learning with Graph Attention Networks. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3874–3884, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):: Contrastive Document Representation Learning with Graph Attention Networks (Xu et al., Findings 2021)
Copy Citation:
PDF:: https://preview.aclanthology.org/fix-sig-urls/2021.findings-emnlp.327.pdf
Video:: https://preview.aclanthology.org/fix-sig-urls/2021.findings-emnlp.327.mp4
Data: IMDb Movie Reviews, MS MARCO, OpenWebText, WebText

PDF Cite Search Video Fix data