Integrating Semantics and Neighborhood Information with Graph-Driven Generative Models for Document Retrieval

Zijing Ou, Qinliang Su, Jianxing Yu, Bang Liu, Jingwen Wang, Ruihui Zhao, Changyou Chen, Yefeng Zheng


Abstract
With the need of fast retrieval speed and small memory footprint, document hashing has been playing a crucial role in large-scale information retrieval. To generate high-quality hashing code, both semantics and neighborhood information are crucial. However, most existing methods leverage only one of them or simply combine them via some intuitive criteria, lacking a theoretical principle to guide the integration process. In this paper, we encode the neighborhood information with a graph-induced Gaussian distribution, and propose to integrate the two types of information with a graph-driven generative model. To deal with the complicated correlations among documents, we further propose a tree-structured approximation method for learning. Under the approximation, we prove that the training objective can be decomposed into terms involving only singleton or pairwise documents, enabling the model to be trained as efficiently as uncorrelated ones. Extensive experimental results on three benchmark datasets show that our method achieves superior performance over state-of-the-art methods, demonstrating the effectiveness of the proposed model for simultaneously preserving semantic and neighborhood information.
Anthology ID:
2021.acl-long.174
Volume:
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Month:
August
Year:
2021
Address:
Online
Venues:
ACL | IJCNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
2238–2249
Language:
URL:
https://aclanthology.org/2021.acl-long.174
DOI:
10.18653/v1/2021.acl-long.174
Bibkey:
Cite (ACL):
Zijing Ou, Qinliang Su, Jianxing Yu, Bang Liu, Jingwen Wang, Ruihui Zhao, Changyou Chen, and Yefeng Zheng. 2021. Integrating Semantics and Neighborhood Information with Graph-Driven Generative Models for Document Retrieval. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 2238–2249, Online. Association for Computational Linguistics.
Cite (Informal):
Integrating Semantics and Neighborhood Information with Graph-Driven Generative Models for Document Retrieval (Ou et al., ACL-IJCNLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/remove-xml-comments/2021.acl-long.174.pdf
Optional supplementary material:
 2021.acl-long.174.OptionalSupplementaryMaterial.zip
Video:
 https://preview.aclanthology.org/remove-xml-comments/2021.acl-long.174.mp4
Code
 J-zin/SNUH +  additional community code